Everyone has been posting their schedules for Cisco Live to Twitter, Facebook and wherever else, so I thought I’d better jump in with the cool kids and publish mine as well. I can’t guarantee this won’t change, but for now it stands as my best guess and current planned schedule.
Editor’s Note: If you haven’t already, check out the first installment in this–hopefully not ongoing–series at http://blog.packetqueue.net/asa-tu/
At approximately 1:58pm PST last Thursday the two edge ASA 5510 units at our corporate headquarters dropped off the network. At the time I was in a different office up in Quebec, Canada and so delegated to one of the other engineers to work the problem with TAC and bring them back online. That process took much longer than expected, and I won’t bore you with the details. What I will bore you with, however, are a few observations I have now that we have more time and experience working with Cisco’s ASA product line:
- The ASA has some sort of systemic, though exceedingly rare, problem on 8.3(x) and newer code.
- Said problem causes the units to reboot and take out the system flash (disk0:) but not user flash (disk1:).
- The flash appears to be erased, but it is in fact the MBR that is gone, not the data (we used a hardware forensic disk analysis unit to verify this).
- Cisco doesn’t have enough data points yet to even acknowledge this is an issue. I don’t believe they’re “hiding” a problem; I just don’t think enough people have experienced the particular set of circumstances that would cause this and subsequently reported back to Cisco.
My own suspicions about the root cause are below, though I’d welcome any additional thoughts from anyone with experiences in this area. I should also point out that I have heard from at least two other people that they have experienced this exact problem.
- The behavior and crash lead me to believe that the ASA experiences, at the point of failure, the equivalent of a Windows “BSOD”. This would point to either memory or motherboard itself as these are the primary hardware-based causes of this type of crash in any system. Most other crashes can be recovered from and produce data.
- The ASA accesses the flash on initial load, but then runs from memory. The flash cards in these units had trashed MBRs which leads me to believe that the ASA was touching the MBR at the time of the crash, which is inconsistent with what I know about how the ASA is supposed to operate. It’s possible it was just accessing the flash to write a crash-dump and crashed partway through. That makes some sense to me.
- All failures I have experienced and heard of from others have at least a couple of things in common: They are all on 8.3(x) code. They are all post user-upgraded to support 8.3(x). This code required a memory and flash upgrade, and so you had to buy upgrades from Cisco and field-install them yourself. These units were also all manufactured immediately following the Cisco manufacturing slowdown in 2008/2009 when lead times were running into the several months range. This makes me a bit suspicious that quality control on either the memory or the units themselves could be to blame. I’ve tried to verify with revision numbers, etc., but I haven’t been able to gather enough data from “out there” to settle on this as a cause.
I hope this helps someone out there, and I truly am interested in getting more information from anyone that has it. Cisco is taking our units back, but pulling them aside before refurbishment so that their engineers can dissect the units. If I find anything out from that I’ll post the findings here.
The configuration and build-out of the ASA 5510 units is as follows:
- 1 Gigabyte of memory, 512MB of system flash, 256MB of user flash. IPS Module, Security-Plus, Botnet filter, AnyConnect Essentials, Mobile, etc. licenses. Actually, just about every license is on board; these units are at this point maxed on everything. Utilization is at a reasonable level still.
- Configuration includes use of multiple IPsec site-to-site VPNs, SSL VPN for all Mac, Linux, Windows, iPad and iPhone, sub-interfaces, stateful failover, both IPv4 and IPv6, OSPF with static redistribution, and full IPS functionality.
Passing the CCIE R&S Written (350–001)
I am proud to say that I have completed the first step on my journey to the CCIE Routing and Switching certification: namely, I passed the written qualification exam. I obviously have a lot more work to do before attempting the lab later this year, but it is a good solid first step, and considering how long I’ve contemplated taking said step it is just good to be moving forward.
I’m not going to go into any details, talk about my score (it wasn’t perfect by any means) or really discuss anything that even smells like an NDA violation. If that’s why your here and how you found this short blog posting, you’re in the wrong place. I’ve worked far too hard for this to diminish either the work I’ve put in to get here, or the work that so many other full CCIEs have put in to attain their certifications. The only way you get the digits is to pay your dues like everybody else.
That said, my brief observation for what it’s worth, is that this test was not entirely what I was expecting. After years of taking different certification tests, including a variety of other offerings from Cisco, this test seemed a bit, well, tame. Not easy, just more straight-forward question and answer. That wasn’t really a positive or negative in my mind since I don’t really consider myself a “test” person and would have preferred a few more hands-on scenarios than I got. But I suppose I’ll get more than my fill come lab-day.
The other interesting thing I noticed was the questions. Some were almost cloyingly easy, while others a bit harder than I would have thought. Possibly that is just a side effect of my studying habits. In other words, the questions I found easy might be the same ones that trip someone else up. When you’ve been at the books long enough, you lose a little perspective on these things. None of the questions, however, were surprising in any way. I think that the subject matter described on the blueprint, as well as some base-level networking knowledge that is just assumed was all covered in a way that you should expect of this level of testing.
The last thing I found different than some of the other tests I’ve taken is the increased reliance on “stacking” technologies. In other words, you could see a question ostensibly focused on a particular technology, but with one or two other technologies represented in the question as well. In particular, you would be required to understand not only all three technologies in the question, but also the subtle interactions that can happen as they work together. My sense is that this is probably intended to be more “real world” representative, and in general I think it worked well.
All in all I think it was like a lot of Cisco tests: fair but difficult. If you know what you’re doing you should pass, and if you don’t, well… take your score breakdown and hit the areas where you were weak. Oh, and Cisco: please make your example diagrams easier to read! I’m not so old that I need reading glasses, but my god some of those diagrams were bordering on illegible. On at least a couple of occasions I had to squint, look sideways, and try to see… like one of those damned “dot” pictures where if you stare long enough you see a dolphin or some other randomly insipid thing you feel cheated for having expended the effort to see.
And now? Off into some hundreds of hours of rack time. Doh!
Random (and not so deep) Thoughts by Some Clown
I haven’t been writing a lot lately, mostly due to a combination of my work and study schedule. I thought, however, that it would be useful to just toss down a few random thoughts on the proverbial paper to wrap up 2010. I’ll try to keep it somewhat cohesive, but I can’t really guarantee anything.
Having made the decision last year at Cisco Live to finally buckle down and pursue the CCIE Routing and Switching certification, I have been as busy as you might imagine with studying. As I’ve gone down this road I’ve noticed a couple of things:
(1) In the office I’m used to studying large white papers, documents, manuals, command references, etc., quickly to get to the answers I need for either deployment or break-fix. This is not the best way to study for the CCIE qualification exam, however, as I tend to just as quickly forget that information past the point of it being immediately useful. I’ve had to change my habits now to include taking notes, reviewing portions over and over, and cross-referencing with multiple sources. Nothing earth shattering to be sure, but a change for me.
(2) As alluded to above, I do a lot of cross-referencing on my study material. I have material from CCBootCamp that I consider to be my primary source (by virtue of being enrolled in the Cisco 360 program through them). I have also been reading the CCIE Routing and Switching Certification Guide, 4th Edition, as well as the CCIE Routing and Switching Exam Quick Reference Sheets–both by Cisco Press. I think it helps me quite a bit to read different perspectives on the same material; to see it put a different way on the page. I have a Cisco Live Virtual account as well, and so have been pulling some presentations–notably on QoS–from that site.
(3) I have over 16 years of professional experience in this industry, and while I am by no means an expert, I am confident in things that I know. To that end I would say that at some point in your studies you will be almost guaranteed to come across information, answers to practice questions, etc., that you just know are wrong. I’ve had to learn not to be afraid to challenge my study material. I don’t do it blindly, but I do go out and research in other sources to verify what I think I know. I have found many instances of incorrect information in several sources–more often than not in the Cisco IOS example configurations. Sometimes using commands that won’t work on that platform, other times referencing non-existent class-maps or access-control-lists. Less often have I found blatantly incorrect explanations of how a thing works, but even there I have found a couple of examples. I take this as a good sign, actually; it’s a sign that I am becoming more aware of the details of what I am studying.
Interesting Design Decisions
It always fascinates and bewilders me to see some of the design decisions that other engineers make when putting together a network. Much of what we do is subjective, and even the most experienced experts disagree on a good many things. With that said, certain things just don’t strike me as particularly useful and it’s my prerogative to complain about them. My top complaints from recent experience, in no particular order are:
(1) My predecessor who built our main datacenter using 4503 switches exclusively: access, distribution and core (mostly, but we do use a collapsed core model). The 4500 series is great but my general argument is that they’re under-powered, or at least under-featured for the core (Sup II-plus) and just a bit overpowered for the access layer. We use PoE 1-Gig to every port in the building, but the access layer is still barely running (less than 1 percent utilization ever, on any metric). I think someone got a deal or something. We’re now replacing the core with a pair of 6506, 720 supervisor, 10-gig, etc.
(2) A main distribution point had a single 3845 with a 100-meg Internet connection, and two full DS3 links. Considering the 3845 maxes out at 45 Meg of throughput, this seems a particularly egregious violation in my mind. We’ve now moved that to a 3945, which if under full load is probably still a tad over-subscribed, but much better and the price was right.
(3) Who was it at Cisco that decided that the ASA-5510 would only have two Gig links available, and only with the right license? Why only two? Why not three or all five? This might be a backplane issue, I don’t know, but it just bothers me.
(4) My own stupidity in setting up the aforementioned ASA-5510 pair (failover) with the inside and outside interfaces on the gig links, when I should have had the two trunk links that handle much more traffic on those interfaces. This will be changed soon, but I should have done it right the first time.
2010 has been a good year overall, with a lot of interesting projects, experiences, and solid learning had by all–or at least me. I’m looking forward to 2011 and all of the continued successes and experiences to come. I’d also like to give a special shout-out to all of my Twitter colleagues, friends, followers, and various clingers-on and lurkers. I have found the Twitter community to be an invaluable source of support, wisdom, and occasionally respite from the rigors of the daily grind. If you’re not on Twitter, I’d highly encourage you to give it a look.
Happy New Year everyone!
Well, it’s over.
Vacations always seem way too short. All of the months of planning, dreaming, working out the little myriad details: then, 10 days in Cozumel and Southern California and it’s done.
C’est la vie.
That does, however, mean that it is right about time for me to finally post something useful here. To that end, look for my new posting on rebuilding access to your ESX Host Server after you bungle a Cisco Nexus 1000v migration. Not that I would know anything about that. *Ahem*
Sometimes the best lessons are the ones we teach ourselves. Often inadvertent and unpleasant, yes, but the most long lasting.
Wow I’ve been busy lately. I realized this not just from how I feel and by looking at my meeting calendar, but also because I just noticed the month that has gone by since my last posting. Time flies when you don’t know what’s going on.
I’ve got some things in the queue, notably around the Nexus 1000v software switches for VSphere. Best practices, experiences in configuration, some tips that might save your bacon during and after implementation, and some random bitching about Cisco’s changing recommendations for how best to configure the management and data VLANs. Hopefully that’s enough of a teaser to get you watching this space, or better yet by subscribing to the RSS feed (I fully support laziness.) I can tell you, however, that it could be another couple of weeks or so as I am off to Shanghai, China for a week or so to set up a new office. Assuming our install date for the MPLS link is correct, in any case.