Sunday, November 23, 2008
Google  
Web voicendata.com
Archive    
"Ad: Nortel data network solutions are 40% more energy efficient" "Ad:Discover Green Intelligence, make your business strong"
 Home > Enterprise > T&M: Help-The Network's Down
  Enterprise
T&M: Help-The Network's Down
No single network troubleshooting method works in every situation. Here is a look at what approaches are better for the different crises
Saturday, December 04, 2004

We've all heard this or maybe even said this. There are many tools and testers to assist administrators with identifying when a network is down and several approaches to react to the alarms. What method is best? The short answer is, none of them. No single method works in every situation. There are basically two approaches to troubleshooting, top down and bottom up. There is also one rule for both.... At some point in time, you will use the one only to realize you should have used the other!

In a recent survey conducted by Infonetics Research, the top three threats to your network are network products, security, and cabling and connectors in that order. Gartner also released a new study that said that roughly 20 percent of all IT investments are for things that don't work.

Top-down Approach
In a top-down approach, the network manager begins at the upper layers of the OSI protocol stack. The administrator tests the application to make sure it is working, then pings the servers, until they are at the bottom of the stack or the physical layer. This approach is best if multiple users initiate the help desk calls. It is very rare that physical layer problems will be an issue for all users, unless of course, it happens to be the only server connection. This methodology allows the administrator to determine if the application or server is down, slow, or for some reason non-responsive to network commands. To be effective, it is generally aided by some tool or network monitoring application that can provide the network manager with some type of trending and actionable data.

Actionable data could be as simple as a ping that results in a host unreachable all the way to monitoring bit errors and other errors delivered via SNMP (Simple Network Management Protocol) traps. The real trick, however, is to determine the cause of the errors. To be effective in doing so, a methodical troubleshooting plan should be used. This should certainly include more than rebooting a server. If a server is going down, there is something causing it to do so. It may be a memory leak, over-utilization in the processors, or other issue, but rebooting should be considered a bandage, not a solution. So, what exactly is actionable data? It is data that provides enough information to be useful and clear enough to determine a plan of action.

Most management packages and monitoring tools allow a network administrator to set thresholds for performance outside of an acceptable range. Knowing where to set these for specific issues will require a bit of trial and error. Set too low, they will make a pager or cell phone flooded with messages, set too high and they will result in unemployment. Blindly accepting the defaults can result in underutilization of the tools. Any time you deploy management software, be sure to spend the money and
get trained. The best training would ideally be on site, in your environment, by someone certified in the software. That way you can eliminate the modules you don't want or need to use and tune the ones that will provide you with the best information. Bandwidth heavy applications and heavily utilized servers will require the most tuning to be of benefit.

Another benefit of management software is the ability to query disparate equipment and retain statistics and trends in one reporting tool. In the old days, and still in many environments today, the network manager is stuck double clicking on each switch in a wide variety of interfaces depending on the server software and active electronics. With a single tool, trending and overall traffic reports can be exported, sorted, etc. These can be used to justify new equipment and upgrades (just a little side perk). The advantages of trending and utilization models is that it allows you to determine which servers could benefit from multiple network cards for instance. It also allows you to segment your switches so that you balance the amount of packets within each switch so that one is not over utilized while the others are under utilized. It also helps you to know what types of packets are moving where so that can be optimized as well.

Bottom Up Approach
In a bottom up approach, the cabling is checked first and then troubleshooting moves up the protocol stack. When one user goes down, it is far easier to start at the physical layer and move up. Some idiosyncrasies can develop when EMI and/or environmental concerns are causing the problem. Physical layer testers are a bit different. These can be field testers, smart bit testers, and/or spectrometers for radio frequency information. What you are testing will determine what type of tester you need. They key here is that the tester be calibrated just prior to the test and that the tester be certified by an independent agency. Test and Measurement World (http://www.reed-electronics.com/tmworld/) has a listing of testers, ratings for how well they perform, and certifications for each variety. You will want to be sure that your tester is certified by an independent testing source.

When there are errors, either continuous or intermittent, it is a good idea to look at your physical layer. Field terminated patch cords are a particular culprit, but other environmental conditions can also be to blame. When walls are moved, cables that were once placed away from fluorescent light fixtures may no longer be outside of acceptable range, new power panels may be located too close, etc. It is important to note that you should not rely on the fact that you have a link light on your switch port to determine if the cable is good or bad. Just like your electronics, there are conditions where you may have a link, but the signal is so degraded from sender to receiver that the packet is useless. Remember the expression "lights are on but no one's home." This is true for copper or fiber that is not performing, but still cause the link light on the switch to illuminate.

Another thing that can happen is performance degradation through autonegotiation down to lower speeds or to half duplex in order to try and maintain the connection. If you have employed Gigabit Ethernet and your cabling was installed before the new parameters for channel performance were adopted, you will also want to have your cabling recertified for the new parameters. This is recommended by the cabling standards bodies. You should note that when equipment is tested for operation with any physical layer media, that this is done in a lab in a pristine environment. Actual installations may vary for a number of reasons. If you are going with the bottom up approach, check all of the physical medium and don't skip this step because you can ping a device or see it's link light. On the other side, if you don't have a link light – it's obvious.

Then you work your way up – checking the network card diagnostics, switch port statistics, and work up to the application. If only one application is not working, start at the top. If several are not working or all are not working for one workstation, start your way up from the bottom. And remember, once in a while it will be in the middle or this rule will be backwards.

Pre-Installation Testing
This step can be one of the best tools you can use to eliminate problems before applications and networks go live. This should include a thorough testing of all components under load. All components means physical layer, network layer and where possible applications. It is unwise to assume that if you are installing standards based components that there will not be problems. This can be particularly true in the physical layer. Anomalies in installation, poor installation practices, EMI or RF interference, and marginally compliant components can all cause errors especially when combined in any combination. The higher the frequency, the worse the problems can become.

Many manufacturers make cables and connectivity with margin above the minimal standards. This provides a bit of a forgiveness factor for installation issues, but proper installation is still the main key to error free performance of any system, active or passive.

Many larger companies maintain test networks for just this reason. When you are troubleshooting a problem, you can move the components to a test lab where they physical layer is certified and determined to be trouble free. Network electronics can then be tested in the test bed before or after implementation if problems are found, in a controlled environment.

Carrie Higbie of The Siemon Company

Page(s)   1  

EVENT REPORT: It's a Sail, Sells too
Digital Signatures That Use Public-key Can’t be Repudiated
The Untethered LAN
 





 

Current Issue


Does your business have Green Intelligence


What is SDSIASWODB?


No.1 Linux platform for SAP Applications


I Want To Protect My Data





Your Opinion Matters

CIO agenda on Cloud Computing

How good is Obama for India?


   CIOL Services
IT News | IT Jobs | IT Outsourcing | IT Shopping
 



  For Voice&Data Print Subscription
  [ Magazine Subscription ]  [ Contact Info ]  [ Advertise : Online | Magazine | Advertising Print ]

 
Other CyberMedia web sites
[Dataquest]  [PCQuest]  [CIOL]  [Living Digital]  [IDC India]
[DQ Channels]  [The DQweek]  [CyberMedia careers]
[CyberMedia Events]   [CyberMedia Digital]  [Cyber Astro]  [CyberMedia India]
[Global Services]  [BioSpectrum]  [BioSpectrum Asia]
[Computer Shopper]   [College Buying Guide]   [Voice&DataConnect

CyberMedia India Ltd

 
  Copyright © CMIL. All rights reserved.
Reproduction in whole or in part in any form or medium without express written permission is prohibited.
Usage of this web site is subject to terms and conditions.
Broken links? Problems with site? Send email to
webmaster@ciol.com