21.4
SURVIVABILITY--WHERE NETWORK MANAGEMENT REALLY PAYS
543
In Ref. 3, Steven Dauber lists five network-specific characteristics that the trouble-
shooter should have familiarity with or data on.
A. Network Utilization. What is the average network utilization? How does it vary
through the work day? Characteristics of congestion, if any, should be known.
Also, where and under what circumstances might it be expected?
B. Network Applications. What are the dominant applications of the network? What
version numbers is it running?
C. Network Protocol Software. What protocols are running on the network? What
are the performance characteristics of the software, and are these characteristics
being achieved?
D. Network Hardware. Who manufactured the network interface controllers, media
attachment units, servers, hubs, and other connection hardware? What versions are
they? What are their performance characteristics? Expected? Met?
E. Internetworking Equipment. Who manufactured the repeaters, bridges, routers, and
gateways on the network? What versions of software and firmware are they run-
ning? What are the performance characteristics? What are the characteristics of the
interfaced network that are of interest?
21.4.1.2
Developing a Hypothesis
. In this second step, we make a statement as to
the cause of the problem. We might say that T1 or E1 frame alignment is lost because
of deep fades being experienced on the underlying microwave transport network. Or
we might say that excessive frames being dropped on a frame relay network is due to
congestion being experienced at Node B?
Such statements cannot be made without some strong bases to support the opinion.
Here is where the knowledge and experience of the troubleshooter really pays. Cer-
tainly there could be other causes of E1 or T1 frame misalignment, but if underlying
microwave is involved, that would be a most obvious place to look. There could be other
reasons for dropping frames in a frame relay system. Errored frames could be one strong
reason.
21.4.1.3
Testing the Hypothesis
. We made a statement, now we must back it up
with tests. One test I like is correlation. Are the fades on the microwave correlated with
the fade occurrences? That test can be done quite easily. If they are correlated, we have
some very strong backup that the problem is with the microwave. The frame relay problem
may be another matter. First, we could check the FECN and BECN bits to see if there
was a change of state passing Node B. If there is no change, assuming that flow control
is implemented, then congestion may not be the problem. Removing the frame relay from
the system and carrying out a BERT (bit error rate test) over some period of time would
prove or disprove the noise problem.
A network analyzer is certainly an excellent tool in assisting in the localization of
faults. Some analyzers have pre-programmed tests that can save the troubleshooter time
and effort. Many networks today have some sort of network monitoring equipment incor-
porated. This equipment may be used in lieu of or in conjunction with network analyzers.
Again we stress the importance of separating cause and effects. Many times, network
analyzers or network management monitors and testers will only show effects. The root
cause may not show at all and must be inferred, or separate tests must be carried out to
pinpoint the cause.