DS3 Troubleshooting

| No Comments | No TrackBacks

Recently, I was helping a customer migrate from a traditional frame-relaly network to a MPLS cloud.  The first step, obviously was to bring up the DS3 at the headquarters end.  The went just fine.  IP connectivity was established and BPG cam up instantly.  A few days later it became time to cut over the first remote location.  The remote sites were reusing the same T1 frame port, only with a different PVC.  The turnup seemed to go ok, IP connectivity and a BGP session had been established in advance of the cut.  We filtered out the routes to prevent data shifting across the new PVC until we were readay.  During the tessting phase, PCs were able to access the internet and Internal applications were working.  There was however one problem, the phones would not register with the call manager. 

We started noticing that the default route advertised by the headquarters through the MPLS cloud was flapping.  Turns out that the DS3 at the headquarters was the reason for the route flap.  The DS3 started bouncing up and down frequently.  We decided to back out the changes and remain on the original circuit until we cound determin the cause.  looking at the DS3 controller yeilded output like this:

   Framing is c-bit, Clock Source is Line
   Bandwidth limit is 44210, DSU mode 0, Cable length is 10
   rx FEBE since last clear counter 33743, since reset 67688179
   Data in current interval (297 seconds elapsed):
     0 Line Code Violations, 0 P-bit Coding Violation
     0 C-bit Coding Violation
     0 P-bit Err Secs, 0 P-bit Sev Err Secs
     0 Sev Err Framing Secs, 45 Unavailable Secs
     0 Line Errored Secs, 0 C-bit Errored Secs, 0 C-bit Sev Err Secs
  Data in Interval 1:
     0 Line Code Violations, 0 P-bit Coding Violation
     2 C-bit Coding Violation
     0 P-bit Err Secs, 0 P-bit Sev Err Secs
     0 Sev Err Framing Secs, 248 Unavailable Secs
     1 Line Errored Secs, 1 C-bit Errored Secs, 0 C-bit Sev Err Secs

The only problem is the telco can't seem to find a problem.  The most obvious thing to do here is validate linecode configurataions throughout the path of the circuit.  Unfortunately if this was the problem the circuit wouldn't be anywhere near as stable as it is.  We did the check anyway, but it was unhelpful.  The IXC decided to loop the circuit at the customer end and run patters with a T-Berd tester.  Typically testing circuits with a T-Berd yields usable results.  In this case everything came back clean. 

"It is an old maxim of mine that when you have excluded the impossible, whatever remains, however improbable, must be the truth." (Doyle, 1892)  It is impossible for the circuit to not work, yet have nothing wrong.  Therefore, either there is a bug in the router IOS, a hardwaare defect/failure, or the T-Berd's resultss are wrong.  So, to eliminate the first two possibilities, we disconnect the T3 at the DMARC and create a hard loop.  A neat trick you can do with virtually any circuit that can be looped is to issue the no keepalives command and ping the interfaces own address.  No keepalives turns off the Frame-relay LMI, HDLC or other frames, and forces the interfaces to believe they are up.  I don't have pattern testing abilities on this hardware, so a couple of pings with different patterns should do the trick!  Good news is, the pings all work, bad news is... one of the tests fails when the IXC has the circuit looped!

I suppose I should back up...  When doing ping tests, you should hit certain types of patterns.  Circuits have issues with bit syncronization, which is the reason for the different kinda of line coding in use on T1 and T3 circuits.  If changes in the bit pattern don't happen every so often the PLL circuit that establishes clock syncronization with the carrier.  Unfortuntely, user data can't be relied on to change 0s to 1s often enough.  To test for problems, certain patterns are quite common: 0x0000, 0xffff, 0xaaaa, and 0x5555.  The first two patterns are all ones and all zeros, The second two are the same, alternating ones and zeros.  Theses tests help you uncover a one's density problem on many circuits.

In this case, the 0x0000 pattern failed when connected to the telco.  Since this fails quite consistantly when I try it, why does the telco's tests all come back clean?  The answer is simple: The T-Berd doesn't do an all 0's pattern test at the DS3 level! 

Eventully with a near endless series of loops in both the IXC and LEC, we discover that 0x0001 (15 0s in a row) passes and 0x0000 (16 0s in a row) fails.  The IXC uses Alcatel equipment thaat by default has a setting that disallows excess 0s in DS3 circuits.  When we sent an all 0s pattern, the carrier recieved a different pattern, due to the Alcatel changing the bits in transit.  This introdiced linecoding errors on the DS3, and obviously caused problems with whatever traffic was modified.  After changing this setting in 3 of the Alcatel muxes, the all 0s pattern worked.

No TrackBacks

TrackBack URL: http://www.ryanhicks.net/cgi/mt/mt-tb.cgi/7

Leave a comment

About this Entry

This page contains a single entry by Ryan Hicks published on January 22, 2009 7:51 PM.

CCDE Results was the previous entry in this blog.

Austin Network Engineer Users' Group - March 2009 is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.