Archive for the ‘Cisco’Category

Cisco Live 2011 – CAE

Incoming search terms for the article:

Share

Cisco Live Sunday Labtorial

This post is late in coming, considering that I’ve been back from Cisco Live for a good couple of weeks now. Nevertheless I’m posting it now, so hopefully someone finds the information useful.

Without going into the details of the entire Cisco Live experience, I’d just like to talk about the class I took on the first Sunday of the show–or the day before the show officially starts, depending on who you talk to.

On Sunday I attended a full-day mock CCIE R&S lab (Session LTRCCIE-3001). This was an instructor-led affair, with Bruce Pinsky (Distinguished Engineer) and Bruno van de Werve (Product Manager) acting as facilitators and proctors. Considering Bruno’s experience as both a proctor for the actual R&S lab, and now the head of the R&S program, this was an experience well-worth having if only for the ability to ask questions.

Unfortunately for all of us, and through no fault of either Bruce or Bruno, the in-class network was crashed from the moment we all got there. There were a number of failures, including some bad cables (how do you miss that in testing) which resulted in all of us essentially sitting around for over an hour.

To make up for the delay in getting started, someone from Cisco came in and apologized and handed out gift cards to Mandalay Bay. It was a nice gesture, but considering the gift cards had a face value of five dollars, it might have been better to not hand out anything. It had the affect of actually irritating several students, and giving the rest of us something to joke about for a while. The class cost $1000 (or 10 Cisco Learning Credits) so the value of even an hour should have been closer to $125 or so.

After that snafu, and a brief presentation by Bruno and Bruce on numbers of CCIE in the world, with breakdowns by region, we got started with the meat of the class: the labs themselves. We were all looking forward to this, since it was being run by Cisco and had the smell of real-world vs. some of the third-party labs (note that I use third party labs for training, and have no problems with them, but this was officially sanctioned and so had a little something extra, at least in “feel.”)

The troubleshooting section came first, and used the same system as the real lab so that was a nice touch. In our case we had only five trouble tickets to complete in one hour vs. the real lab which has ten in two hours. I believe this was done to facilitate the “instructor led” nature of the class, and allow us to ask plenty of questions. Bruce and Bruno were stellar in this regard, coming around to any student with a question and helping them to understand the problem or just passing out hints to those who still wanted to figure it out on their own.

I learned a lot about myself and my troubleshooting techniques during this portion of the day, as I got bogged down on the first ticket and blew the rest of my time. It was a relatively straightforward ticket where a particular address wasn’t answering an ICMP Echo to another device. It was a few routers together, with BGP. I spent the entire hour re-architecting the BGP–down to bare metal and rebuilding the config from scratch–and almost was done when time expired. As it turned out, it was a simple address statement that was missing.

Bruno got a chuckle out of this and pointed out that the lab is not intended as a “best practices” lab. He said that in most cases you won’t be removing configuration at all during the TS section; you’ll simply be adding something missing or correcting route statements, etc. It was helpful for me to hear this and to go through the experience, because it taught me that I really need to focus on finding the simple problem quickly and not rebuilding things the way I think they ought to be built. After 17 years in the industry, that’s a difficult habit to change, but one I’ll have to in order to be successful on the real lab.

After a brief recap and break, we moved on to the configuration section. For the most part there were no surprises here, and I had my Layer-2 (Frame, Spanning-tree, VTP, etc.) and IGP (RIP, OSPF, and EIGRP here) set up quickly enough. Redistribution was what you’d expect, with a lot of everything going every which way. Again, no one in their right mind would ever design that network, but it’s what you can expect to see in the lab.

The one thing I did miss and had to have Bruno point out to me, is in a redistribution task regarding OSPF. The task wanted a route from one area to show up in area 0. I got the route there, but Bruno said that I had it wrong. Reason? The area where the route originated was discontiguous, or detached from area 0. We all know that typically means you want a virtual link, but since the task didn’t specify this I simply brought the route into area 0 as an external. Bruno said that the task “implied” a virtual link, and while I disagree with the wording of the task and the nature of implied configurations, it was helpful to hear since this is likely the same kind of thing I’ll see in the real lab.

Where I slowed down–and I knew I would–is on the MPLS and BGP configuration sections. As a long-time enterprise engineer, I simply don’t touch either of these technologies in the real-world, and I haven’t spent enough time with them in the lab to feel comfortable. I still muddled my way through some of it, but with the amount of time it took I’d never make it through the real lab. The message for me here is that I really need to take some time with these technologies until I not only understand them well, but can configure them quickly.

Overall, this was a very valuable experience and one I would heartily recommend to anyone looking to take the R&S lab. It gave valuable insight into the time pressures you’ll face, as well as the number of tasks, the wording, and the level of difficulty you can expect to see. This is just one more reason that Cisco Live is where you want to be every year if you’re at all serious about your networking career.

Incoming search terms for the article:

Share
Tags: , , ,

Cisco Live 2010 Photos

Cisco Live!

I just wanted to drop a quick note here to let everyone know that I plan on blogging a little more frequently this week, as I’ll be in Las Vegas for the annual North American Cisco conference: Cisco Networkers/Live. I can’t promise I’ll be in my room slaving over long, elaborate breakdowns of technology–maybe an in depth review of whiskey selections by bar–but I will try to post some pictures and information about what I’m seeing and hearing during the show. In years where I haven’t been able to attend the show, I always liked seeing and hearing from folks who did. Now it’s my turn to give back something, so watch this space…

Oh, and keep up with real-time information on twitter where I hide behind the handle @someclown, or G+ where I can be found at: http://gplus.to/someclown.

Share

IPv6 Half-truths

This post will be a short one, and mostly just comes from a discussion I had the other day with another engineer.  It turns out that even among people who are comfortable with IPv6, and maybe even have experience deploying it, a lot of misinformation still persists.  Hopefully I can correct a couple of those today.  I also tossed in a hot-potato at the end just to see how many folks get hopped up.  Discussion is welcome, and in addition to comments here I can be found on twitter hiding behind the handle: @someclown.

You must turn on IPv6 by using the IPv6 unicast-routing command.

Not true.  This is one of the more persistent, yet wildly incorrect, pieces of information regarding IPv6.  I have even seen many training centers and instructors at the CCIE level get this one wrong and it falls into the category of attention to detail.  What this command actually does is enable unicast routing for IPv6, just as it says.  To actually enable IPv6 you simply need to go to any interface and use the ipv6 enable command.  And yes, you can enable IPv6 on the interface without enabling unicast routing.  Of course, it would be helpful to have an address on the interface as well.

Yes, but if you don’t turn on unicast routing you can’t route IPv6 traffic.

Not strictly speaking true.  You can still set up a default route for IPv6 traffic and get it off of your system.  To the extent that you want to argue whether or not this is actually routing is fine, but you can move IPv6 traffic off of your local device using a default route, and never have enabled routing for IPv6.

Using a /127 address on point-to-point links is wrong, wrong, wrong.

This is an interesting one, and usually sparks a fair amount of debate.  Up until very recently, the recommendation across the board (RFC 4291) was to use /64 addresses even on point-to-point links, ostensibly because the IPv6 space is so big anyhow, and because several protocols will break (notably subnet-router anycast, specified in RFC 3627).  While I’m not disputing that this is what the current best-practices reflect, I will say that RFC 6164 which has a status of Proposed Standard makes a fairly compelling case for using /127 on point-to-point links.  I’m sure this won’t be resolved anytime soon, standards or no, but I would say that if you have a compelling reason for using /127 and know what you’re doing it for, go for it.  Just be aware that standards can change, and you don’t want to leave a steaming pile for the poor person who has to follow you.

Incoming search terms for the article:

Share
Tags: ,

Home CCIE Study Lab

So, a lot of people who are working towards their CCIE certifications end up building home labs for studying.  The reasons are many and varied, but mine boiled down to two primary ones:

 

(1) My study hours don’t always match well with what slots the online rack vendors have available.

(2) I just like physical equipment and the flexibility it provides in both studying and in research.

Also, I just wanna.

With that said, one of the next things people want to know is what gear it is that I have, and how do I have it configured.  Therefore, with the recent posting frequency here severely lacking, writing about my lab is a nice way to get something fresh on the blog and hopefully it provides something useful to someone out there.  I’m going to break this down into two general categories: equipment that I have purely for my Cisco CCIE lab, and other equipment that I have either for my home network or for random reasons.

Random Non-Lab Specific Equipment List:

  1. WS-2950T-24 switch
  2. Two 1142N access points
  3. Wireless Controller Module (NME-AIR-WLC6-K9) which is pretty fun, but breaks bonjour and so is the bane of my existence (see previous post here.)
  4. ASA 5505 with IPS module, running bot-net filter and some other things.  This is also the main gateway for the home network, connecting up to the cable modem.  It’s also the IPsec endpoint for my always-on connection to the office and segments my home network, lab network, work network, etc.
  5. Comcast Cable Modem, made by Motorolla
  6. Random doohickey for my “Whole-Home DVR” with DirecTV
  7. Sun SunFire v240 Server with StorEdge 3300 storage array

Non-plugged in Equipment

  1. Sun Enterprise 3500
  2. Two Cisco 3550 switches
  3. Two PIX 501

Computer storage:

  1. Seagate 750GB external drive (USB)
  2. Iomega 1TB external (eSATA)
  3. Drobo S with 5 2TB drives for 10TB raw (eSATA)

Cisco Lab Equipment:

  1. Four 3560-X switches, with four-gig uplink modules (might still get the 10 at some point), fully licensed with IPServices and running 12.2(53) SE2.
  2. Eight 2801 routers, all running 12.4(22)T5 Advanced Enterprise, and all with at least one Wic-2T smart serial card which provides two smart serial connections.  Four of the 2801 have two Wic-2T cards, and a couple others have a mixture of 1-Wic-DSU-T1 cards, FXO cards, and FXS cards (mostly leftovers and hand-me downs, but there are some interesting possibilities.)
  3. One 2811 running the same IOS as the 2801 routers, used as a backbone router for injecting routes and some other misc. stuff.
  4. One 2621 running something-or-other and acting as another backbone router.
  5. One 3845 running the same Advanced Enterprise as the others.  This has five Wic-2T cards and acts as the frame switch. It also has an HWIC-16A card and does reverse-telnet to everything else (terminal server).  It also houses some random stuff including the wireless controller mentioned above.

All of this is cabled and wired almost identically to the CCBootCamp lab topology.  This is because I have all of their workbooks and wanted to be able to study with my own equipment.  A couple of the details are different, mostly around interface numbers and the specifics of the backbone routers and such.  Also, the switches I have are way overkill but satisfy the lab requirements.  Given the actual topology from just about any mainstream training provider, I can copy it with the equipment I have, and that’s exactly what I wanted to be able to do.  As always, contact me here or on twitter with questions and comments.

Pictures of the lab and sundries are included below.

Incoming search terms for the article:

Share
Tags: ,

Cisco Live 11 Schedule

Everyone has been posting their schedules for Cisco Live to Twitter, Facebook and wherever else, so I thought I’d better jump in with the cool kids and publish mine as well.  I can’t guarantee this won’t change, but for now it stands as my best guess and current planned schedule.

Incoming search terms for the article:

Share

Frame Relay Switch Setup

Editor’s Note: Google Chrome seems to dislike my site theme and is hyphenating absolutely everything.  Apologies for that, and I’ll look into it just as soon as I get done with a few items on the “honey-do” list.

If you are studying for the CCIE Routing and Switching exam, one of the technologies that is still heavily prevalent is Frame Relay.  It is expected that you know both the technology itself, and how to configure it, but also how it interacts with and affects other key technologies like OSPF and EIGRP.  Having the ability to study Frame Relay, then, and get plenty of hands-on configuration time becomes as important as with anything on the R&S 4.0 Blueprint.

While many network engineers are already familiar with Frame Relay from a consumer side–in other words, from the perspective of an entity which buys Frame Relay services from a provider–not many of us are familiar with the service provider portion of the equation.  This makes setting up practice labs difficult if you are trying to study using your own equipment.  Fortunately, you can set up your own Frame Relay switch fairly easily, and that is what we’re going to walk through today.

A Frame Relay switch is the DCE device that sits inside a service provider’s network and moves the frames along from point A to point B.  There are many of these devices all working together inside of your provider’s network to move your information along, but fortunately for lab candidates studying at home, you can easily get by with just one.  Even more fortunate is that you can use a fairly low powered router to act as a Frame Relay switch, and not miss anything that you’ll need for purposes of the lab.

A quick note on the lab is in order here.  It used to be a part of the lab blueprint (dont’ ask me which one, or how far back in time) that you had to know how to set up a Frame Relay switch.  Cisco has since taken that requirement away, at least from the R&S lab, and so a lot of that knowledge isn’t communicated in teaching texts any longer.  What you’ll find in the lab itself is an already configured Frame Relay switch that you’ll have no direct access to, but all of the information you need to make your equipment talk to it.

It may seem counterintuitive, but for a home lab the best device to use for a Frame Switch is actually a router.  For instance, I’m using an older Cisco 2621 model for my Frame Switch, and it does everything I need it to do.  Service providers will typically use more specialized gear, but all we’re going for in our studies is a reasonable facsimile.  If you want to spend a lot of money, follow the advice of so many others and spend it on your layer-3 switches.

Another thing we want to briefly discuss is interfaces.  Generally speaking, you can either follow the “run what’cha brung” philosophy of just using what you have access to, or you can buy the interfaces you want.  In my case I had a couple of WIC-T1 cards that I’ve used, and then I bought a handful of WIC-2T serial interface cards.  The key is to have a serial interface for each router you want to connect via the Frame switch.  So I have one T1 interface, and six serial interfaces for a total of seven devices I can connect into the Frame “cloud”.  I find this to me more than adequate, though if you’re trying to duplicate a specific topology you may need more or less.

The configuration of a Frame switch is actually very simple, as you’ll see, though attention to detail does matter.  I’m assuming here, by the way, that you already know how to set up your router for basic access, clock, etc., so I won’t cover that here.  So, the first step in configuring your router to be a Frame switch is to put it into Frame switching mode using the commands:

ip cef
frame-relay switching

 

These commands turn on Cisco Express Forwarding, put the router into a Frame switching mode, and change quite a bit of the default behavior, so don’t expect to use this device as a router in any lab topology you’re working on.  This device will be just a Frame switch and nothing more.

The next step is to configure the individual interfaces you’ll connect your other routers to, and you have a lot of choices here.  I don’t know exactly how the R&S lab devices are set up, so I’m just going to give you the configuration I use.  I’ll post the configuration below, and then go over the key comands:

interface Serial0/1
no ip address
encapsulation frame-relay
logging event subif-link-status
logging event dlci-status-change
clock rate 8000000
no frame-relay inverse-arp
frame-relay intf-type dce
frame-relay route 220 interface Serial0/2 120
frame-relay route 221 interface Serial0/0 320

 

The first few lines of the configuration should be familiar to you already.  We’re setting our interface encapsulation to frame-relay, and then logging on a couple of events.  The logging is completely up to you, and not necessary one way or another.  I just find them helpful.  Next we set the clock rate, and we tell the interface that we are the DCE end of the connection.  Remember, in a Frame Relay network the clocking (DCE end) comes from the line or provider side, so this is what you’ll want.  If I am working with a T1 serial interface, I’ll also need a line for that:

service-module t1 clock source internal

 

This can change depending on the type of card and how you have it configured.

Now, the other options we have here require a little more explanation.  The “no frame-relay inverse-arp” command does just what it says, and you can argue for the Frame switch having this turned on, or off.  In most cases in the lab, you’ll be instructed to not use inverse arp on the DTE devices, so I’ve just turned that functionality off on my Frame switch from the outset.  It’s really your call.

The next two lines beginning with frame-relay route are the ones that always seem to cause confusion.  You can read the first line as “If some traffic comes in from DLCI 220, with a destination of DLCI 120, send it out interface Serial 0/2″.  Substitute DLCI 221 and 320 on the next line, but otherwise read it the same.  So if I now plug in a router to interface Serial 0/1, and assign DLCI 220 and 221 to two different sub-interfaces (for instance, different options are possible) the Frame switch will know what to do with that traffic.

So, if we have a diagram that looks like the following:

Then we have a configuration for interfaces that looks like so:

interface Serial0/0
no ip address
encapsulation frame-relay
logging event subif-link-status
logging event dlci-status-change
service-module t1 clock source internal
no frame-relay inverse-arp
frame-relay intf-type dce
frame-relay route 320 interface Serial0/1 221
frame-relay route 321 interface Serial0/2 121
!
interface Serial0/1
no ip address
encapsulation frame-relay
logging event subif-link-status
logging event dlci-status-change
clock rate 8000000
no frame-relay inverse-arp
frame-relay intf-type dce
frame-relay route 220 interface Serial0/2 120
frame-relay route 221 interface Serial0/0 320
!
interface Serial0/2
no ip address
encapsulation frame-relay
logging event subif-link-status
logging event dlci-status-change
clock rate 8000000
no frame-relay inverse-arp
frame-relay intf-type dce
frame-relay route 120 interface Serial0/1 220
frame-relay route 121 interface Serial0/0 321

 

I hope that helps out, and as always if you have any questions or clarifications please drop me a line here or on twitter where I’m known as @SomeClown.

Incoming search terms for the article:

Share

Nexus Crash

As is typical in the world of IT, problems have a way of sneaking up on you when you least expect it, then viciously attacking you with a Billy-club.  Often this happens when you are asleep, on vacation, severely inebriated, or have already worked 40-hours straight with no sleep.  In my case, Super-Bowl Sunday at around 8:30pm was my time to get the stick.  And get it I did.

For reasons too sad to warrant comment, and far too irritating to explain in a family forum like this, our ESX host servers all became disconnected from our SAN array.  The root problem was something else on layer-2, and got resolved quickly, but the virtual world was not so quick to recover.  In retrospect, the problem was not a bad one, but when you’ve been drinking and can’t see the obvious answer you tend to dig the hole you’ve fallen into deeper rather than climb promptly out.

By way of background, we are currently running VSphere 4.0, with a few servers having 32GB or memory and 8-cores, and a few having 512GB of memory and 24-cores. All ESX Hosts are SAN booting using iSCSI initiators on a dedicated layer-2 network.  We use Nexus 1000v soft switches and have our ESX Hosts trunked using 802.1q to our Core (6506-E switches running VS-S720-10G supervisors).  Everything is redundant (duplicate trunks to each Core switch) and using ether-channel with mac-pinning).  So there you have that, for what it’s worth.  Now back to the crashed servers.

We rebooted all of the ESX host servers, and with the exception of some FSCK-complaining they all came up quite nicely.  The problem was that none of the virtual machines came up.  Let me add that we have the domain controllers, DHCP, DNS, etc. on these hosts.  Crap.

So the first thing I did in my addled state was to add DHCP scopes to the DHCP servers at another office across the country, and point the VLANs off “that-a-way” by changing the ip helper-address on each VLAN on the Core.  That got DHCP and DNS back online.  As you can probably guess by now, I was MacGyver-ing the situation nicely, but really didn’t need to.  That’s one of the problems when you’re in the trenches: you tend to think in terms of right-now instead of root cause.

The next thing I did was to start bringing up the virtual machines one-by-one using the command line on the ESX hosts.  Why?  Because I had no domain authentication and the VSphere Client uses domain authentication.  Here is where someone in a live talk would be interrupting me to point out that the VSphere Client can always be logged into using the root user of the hosts, even when domain authentication is set up for all users.  Yes, that is true and it would have been handy to know at the time.

In order to bring up the virtual machines, I had to first find the proper name by issuing:

vmware-cmd –l

from the command line.  This command can take a while to run, especially if you have a lot of VMs sitting around, so go get a cup of coffee.

Once I had that list I prioritized the machines I wanted up first, and issued the:

vmware-cmd //server-name.vmx start

command on each one.  That should have been the end of the boot-up drama, but it wasn’t.  As it turns out, a message popped up (and I don’t remember the exact phrasing) to the effect of “you need to interact with the virtual machine” before it would finish booting.  So, now I issued the:

vmware-cmd //servername.vmx answer

command and got something that looked about like this:

Virtual machine message 0:
msg.uuid.altered:This virtual machine may have been moved
or copied.
In order to configure certain management and networking
features VMware ESX needs to know which.
Did you move this virtual machine, or did you copy it?
If you don't know, answer "I copied it".
0. Cancel (Cancel)
1. I _moved it (I _moved it)
2. I _copied it (I _copied it) [default]

Well, I didn’t know so I selected the default option (I copied it) and went on my way.  That is fine in almost every circumstance and got all of my servers booted up.  It did not, however, entirely fix the problem.  In fact, even though all of my servers were booted, none could talk or be reached on the network.

This is where a little familiarity with the Nexus 1000v soft switches comes in handy.  Very briefly, the architecture is made up of two parts: the VSM or Virtual Supervisor Module and the VEM or Virtual Ethernet Module.  The VSM corresponds roughly to the supervisor module in a physical chassis switch, and the VEMs are the line cards.  The interesting bit to remember for our discussion is that the VSMs (at least two for redundancy) are also Virtual Machines.

Some of you may have guessed already what the problem turned out to be, and are probably chortling self-righteously to yourself right about now.  For the rest of us, here’s what happened:

I figured out the log-in-using-root thing and got the VSphere client back up and running (oh, not before having to restart a few services on the Virtual Center Server, which is not a virtual machine, by the way.  I’m not totally crazy!).  Once I got that far I could log in to the Nexus VSM, and look at the DVS to see what was going on.  All of my uplink ports (except for ones having to do with control, packet, vmkernel, etc.) were in an “UP Blocked” state.

The short-term fix (again, the MacGyver job) was to create a standard switch on all hosts and migrate all critical VMs to that switch.  That didn’t, however, fix the problem permanently and besides, we like the Nexus switches and wanted to use them.  With that in mind, and a day or two to normalize the old sleep patterns, I set up call with VMware support.  This actually took longer than I expected since I had to wait for a call-back from a Nexus Engineer, and they are apparently as rare as honest sales-people or Unicorns.  That said, I did get a call back and we proceeded to troubleshoot the problem.

One thing that surprised me was that it took the Nexus Engineer a bit longer than I would have thought to find the problem, but even once he did it took longer to get resolution because we had to get Cisco involved.  The problem, as it turns out was licensing.

When you license the Nexus, you receive a PAK and you use that to install the VSM.  Once you do that, you have to request your license using the Host UID of the now installed VSM.  Cisco then sends you a license key that you install from the command-line of the VSM.  This is all somewhat standard and not surprising.  What was surprising was that we would have to do this at all considering we had been licensed at the highest level (Enterprise, superdy-duperty cool or something) for years.

What happened was that the copy VSphere made in order to get each Virtual Machine back up after our crash changed the Host UID of the VSM virtual machine(s).  Thus, the license keys were no longer valid and all host uplink ports went into a blocked state.  (I’ll save you the obvious gripe I have with the Nexus not offering any kind of command-line message about our licensing being hosed.)  This is where we had to get Cisco Licensing involved, as we had to send them the old license key files and the new Host-UID information so that they could generate new keys.  Considering I was only on the phone with them for only 15 minutes, it was as pleasant an experience as I’ve ever had dealing with Cisco’s Licensing department.  At least that’s something.

After fixing the licensing, the ports unblocked and I went through the tedium of adding back adapters to the Nexus, moving servers, etc.  At the end of the day, however, it is all back to normal and working.  There are a lot of lessons learned here, and you’ll no doubt pull your own, but the one overriding thing to be on the lookout for is that, under certain circumstances, if your Nexus VSMs are part of a crash and come back up, look to licensing first before troubleshooting anything else.  Oh, and try to schedule your major system crashes for a more convenient time… when you’re sober.  Just saying.

Incoming search terms for the article:

Share

Back from China

Back from China

After roughly one week in Shanghai, China to set up a new site on our corporate network, it is painfully apparent that I need to get back into this writing business.  The dates on my posts belie my weak attempts at covering up my laziness.  That said, there were at least a couple of things of note worth using up a few words on.

Note One (where it really is someone else’s fault)

Due to an unfortunate series of poor decisions, poor project management, and a quite sudden and unreasonable expectation of delivery dates, we [IT] were forced to poach some bandwidth from a sister company of ours who had a slight excess in the manufacturing facility where our new office was to be set up.  By slight, I mean exactly 256kb.  For those of you not accustomed to seeing the abbreviation for kilo, well, you’re too young.  More on that in a moment.

All of that aside, we engineered the circuit all the way from the provider’s edge in Shanghai, back to our facilities on the West Coast and verified traffic was flowing.  Once we got to Shanghai and hooked up our router and started building out the network behind it, we noticed that we couldn’t move any traffic at all.  With a quick extended ping using the inside network interface as the source, as well as some trace-routes from other places, we verified that the provider had neglected to put a route in the BGP tables for the new network.  Thank God for 24-hour NOC support, and within 30 minutes that problem was resolved.

Note Two (where the author tries to check the turn-signal fluid)

As we moved on to creating and joining up a shiny new R2 Read-only Domain Controller (RODC) everything went off the rails.  Timeouts galore.  DNS wouldn’t resolve quickly enough to allow the new DC to join the forest.  Off I go on a jolly search for default timeout settings, registry tweaks, offline methods to install a DC (ugly at best) and generally going further down the rabbit hole of complexity, even going so far as to direct another engineer working for me to prep for a call to Microsoft (never fun.)

Having already racked a lot of gear, we decided to call it a night and come back fresh in the morning.  I always find it helpful to contemplate problems like this over a good single-malt Scotch.  So I did, a few times, and that led to morning and a face-palm moment.  To wit:

In the cab on the way to the office the next day it occurred to me that I should check the security on the firewalls back home.  I knew I didn’t put any ACLs on that new link, knowing I’d be testing and I prefer to test in the absence of artificial problems, then crank down the screws once I’m confident of the design.  I thought, however, that I had overlooked a NAT exemption or something else and decided to spend some quality time checking that portion of the infrastructure.

So, I got my coffee at the office (thankfully the office manager is a Kiwi who favors Starbucks) and started to look over the configuration of the firewalls.  Right there was my face-palm moment: an ACL which, in some security-conscious delusion I had put on the link in question, allowing ICMP traffic and denying everything else.  *GAAAAAH*  Long story short, I changed the ACL and everything “magically” started working.

All I can say here is that it doesn’t matter who you are, or how much experience with troubleshooting you have, always start with the simple stuff first.  Some of the best advice I ever got was from an instructor of mine who was fond of saying “be the packet.”  By that he just meant that you have to start from the packet’s point of view and slowly work through everything that happens to said packet from beginning to end.  Wiring, arping, routing, etc. whatever.  Be the packet.

Also, don’t be afraid to admit mistakes.  They will happen and hopefully other people around you can learn from them.  At least after they’re done laughing.  Which sometimes takes a while.

Note Three (hey you kids, get off my @#$ lawn)

I just wanted to take a quick moment to address the inevitable questions about our link at 256kb, and the musings that I obviously must have meant mb instead.  Bandwidth is always seen as one of those more is better kind of things, and ignoring the temptation to toss out the standard bandwidth/delay screed here, let me just tell you that you can get by with less than you think. By the way, those of us who can remember when a 2400 baud modem was blow-your-hair-back fast (multiple lines of text at once!) tend to be more pragmatic about these things.

On that link we currently have a domain controller running, several workstations with email and our main corporate ERP software.  We also, and this is what I really like, have two 7900-series IP phones running.  These are both homed to our main CUCM, Unity and IPCC servers back in the U.S. and have excellent call quality.  In fact, we tested a phone call (no local calling for these phones in China) from those phones in Shanghai, through the CUCM in the U.S., and back to a U.S. homed cell phone in Shanghai and got no discernable jitter.

Moral of the story?  You can get by with less bandwidth than you think.  Would I choose to?  Hell no!  :)

Incoming search terms for the article:

Share