January 2009 Archives

DS3 Troubleshooting

| No Comments | No TrackBacks

Recently, I was helping a customer migrate from a traditional frame-relaly network to a MPLS cloud.  The first step, obviously was to bring up the DS3 at the headquarters end.  The went just fine.  IP connectivity was established and BPG cam up instantly.  A few days later it became time to cut over the first remote location.  The remote sites were reusing the same T1 frame port, only with a different PVC.  The turnup seemed to go ok, IP connectivity and a BGP session had been established in advance of the cut.  We filtered out the routes to prevent data shifting across the new PVC until we were readay.  During the tessting phase, PCs were able to access the internet and Internal applications were working.  There was however one problem, the phones would not register with the call manager. 

We started noticing that the default route advertised by the headquarters through the MPLS cloud was flapping.  Turns out that the DS3 at the headquarters was the reason for the route flap.  The DS3 started bouncing up and down frequently.  We decided to back out the changes and remain on the original circuit until we cound determin the cause.  looking at the DS3 controller yeilded output like this:

   Framing is c-bit, Clock Source is Line
   Bandwidth limit is 44210, DSU mode 0, Cable length is 10
   rx FEBE since last clear counter 33743, since reset 67688179
   Data in current interval (297 seconds elapsed):
     0 Line Code Violations, 0 P-bit Coding Violation
     0 C-bit Coding Violation
     0 P-bit Err Secs, 0 P-bit Sev Err Secs
     0 Sev Err Framing Secs, 45 Unavailable Secs
     0 Line Errored Secs, 0 C-bit Errored Secs, 0 C-bit Sev Err Secs
  Data in Interval 1:
     0 Line Code Violations, 0 P-bit Coding Violation
     2 C-bit Coding Violation
     0 P-bit Err Secs, 0 P-bit Sev Err Secs
     0 Sev Err Framing Secs, 248 Unavailable Secs
     1 Line Errored Secs, 1 C-bit Errored Secs, 0 C-bit Sev Err Secs

The only problem is the telco can't seem to find a problem.  The most obvious thing to do here is validate linecode configurataions throughout the path of the circuit.  Unfortunately if this was the problem the circuit wouldn't be anywhere near as stable as it is.  We did the check anyway, but it was unhelpful.  The IXC decided to loop the circuit at the customer end and run patters with a T-Berd tester.  Typically testing circuits with a T-Berd yields usable results.  In this case everything came back clean. 

"It is an old maxim of mine that when you have excluded the impossible, whatever remains, however improbable, must be the truth." (Doyle, 1892)  It is impossible for the circuit to not work, yet have nothing wrong.  Therefore, either there is a bug in the router IOS, a hardwaare defect/failure, or the T-Berd's resultss are wrong.  So, to eliminate the first two possibilities, we disconnect the T3 at the DMARC and create a hard loop.  A neat trick you can do with virtually any circuit that can be looped is to issue the no keepalives command and ping the interfaces own address.  No keepalives turns off the Frame-relay LMI, HDLC or other frames, and forces the interfaces to believe they are up.  I don't have pattern testing abilities on this hardware, so a couple of pings with different patterns should do the trick!  Good news is, the pings all work, bad news is... one of the tests fails when the IXC has the circuit looped!

I suppose I should back up...  When doing ping tests, you should hit certain types of patterns.  Circuits have issues with bit syncronization, which is the reason for the different kinda of line coding in use on T1 and T3 circuits.  If changes in the bit pattern don't happen every so often the PLL circuit that establishes clock syncronization with the carrier.  Unfortuntely, user data can't be relied on to change 0s to 1s often enough.  To test for problems, certain patterns are quite common: 0x0000, 0xffff, 0xaaaa, and 0x5555.  The first two patterns are all ones and all zeros, The second two are the same, alternating ones and zeros.  Theses tests help you uncover a one's density problem on many circuits.

In this case, the 0x0000 pattern failed when connected to the telco.  Since this fails quite consistantly when I try it, why does the telco's tests all come back clean?  The answer is simple: The T-Berd doesn't do an all 0's pattern test at the DS3 level! 

Eventully with a near endless series of loops in both the IXC and LEC, we discover that 0x0001 (15 0s in a row) passes and 0x0000 (16 0s in a row) fails.  The IXC uses Alcatel equipment thaat by default has a setting that disallows excess 0s in DS3 circuits.  When we sent an all 0s pattern, the carrier recieved a different pattern, due to the Alcatel changing the bits in transit.  This introdiced linecoding errors on the DS3, and obviously caused problems with whatever traffic was modified.  After changing this setting in 3 of the Alcatel muxes, the all 0s pattern worked.

CCDE Results

| No Comments | No TrackBacks

On Saturday immediately following Christmas, Santa Clause, who looks remarkably similar to my mailman presented me with a shiney white envelope from Vue, among other things (likely bills or some other such nonsense).  Where I live, it is a long way down the driveway to the house - a trip made even longer by the anticipation and anxiety welling up inside me.  A nearly a month late and near constant clicking on the refresh button the last several days, I had to fight just to keep from flinging the remainder of the post to the wind.  The fact that several highly skilled engineers had already posted less than stellar news on the Cisco Learning Network just the night before.

After finally making it to the house, filled with my wife's guests, I ever so quietly and calmly slid my shakey hands down the length of the envelope's lid and tried to lean against the counter to steady the paper so I could actually focus on it long enough to read the only word I could actually see clearly while in this state: CONGRATULATIONS!  After the trembling stopped, I found my way to the second page to find my score report which was conspicously missing anything that remotely resembled a score or a report, but did contain a number.  From the beginning of the beta program it was decided that there would be a new numbering scheme for CCDEs, but it was not announced what it would be. 

Well that question was now answered: 20080001.  1?  1?  Really?  Its been nearly two weeks since that day, and I still have trouble with that.  First let me say that Cisco only invited the best to the beta program.  200 people took the beta written.  60 were invited to the beta practical, of which 42 (ish) attended.  And out of those, I got the first number?  All I can say is wow!  So far only 3 people have acknowledged reciept of a CCDE number, and from the rumor mill, that is all that passed.  7%  Ouch.  I have to feel humbled here, because there are bigger names than mine that attended.  Some people doing the things I had dreamed, but they didn't pass.  Don't get me wrong: I am very excited, and proud of what is truely a once in a lifetime experience! 

If you ever get the chance to participate in a beta program from Cisco, no matter what it is - DO IT! 

To all of my peers that took this exam: You all passed in my eyes.  Congratulations for helping to make an important new certification for the Cisco community.  I hope you all attend the next exam on Feb 11, 2009.  I expect to hear about the next CCDEs soon!  A special congradulations goes out to the other beta participants that passed. 

Michael Morris    CCDE#20080002

Reinhold Fisher   CCDE#20080003

About this Archive

This page is an archive of entries from January 2009 listed from newest to oldest.

December 2008 is the previous archive.

March 2009 is the next archive.

Find recent content on the main index or look in the archives to find all content.