Now that we live in a world of real-time collection and streaming of data, any latency or jitter on our network can quickly escalate into a serious problem. Network administrators are tasked with troubleshooting, but tracking down the root cause of network latency is easier said than done. In some ways, it's a cat-and-mouse game that requires multiple tools to diagnose the origin of the problem. Identifying and fixing network latency culprits is a three-step process:
- Initial confirmation of latency
- Locating the area where latency is occuring
- Identifying and eliminating the true source of the latency
I'll examine six network analysis tools that can be used in one or more steps of the troubleshooting process.
Our trusty ping utility can verify the reachability of devices on a network, and also provide data on how long it took an ICMP packet to reach its destination and back. Pinging multiple devices on different network devices and segments and comparing the round-trip times can provide valuable information on network latency. It can be used to help confirm latency as well as to identify what part of the network is experiencing the slowdown.
Traceroute is similar to ping in the fact that it provides round-trip information between a source and destination IP address on a network. Unlike ping, however, traceroute provides these times on a hop-by-hop basis. In enterprise networks, there are likely to be multiple hops along the path between source destination devices on a network. The traceroute utility works by sending a UDP (for Linux-based systems) or ICMP (for Windows systems) with a TTL value of 1.
Once the packet reaches the first hop down the path to the destination, that device realizes the TTL limit is reached and responds back to the source device, which collects the round-trip time value. The next packet will then be set to a TTL of 2, then 3 and so-on until the final destination IP is reached. You end up with a list of all the devices between the source and destination and how responsive those hops are at that point in time. This helps to further narrow down the source of network performance problems.
SNMP monitoring tools aren’t simply for determining whether a networking interface is up or down or for baselining link utilization. They also can be used to track down network bottlenecks that could be the source of latency problems. For example, if a server backup is clogging a critical uplink that contains voice traffic, an SNMP monitor can identify the interface with the bottleneck. Additionally, SNMP utilities can track down hardware malfunctions such as a CPU spike on a switch.
If your SNMP monitor can identify uplinks on the network that are congested, that’s great. But at the same time, it doesn’t point us to the true cause of the congestion. Is this legitimate data being sent? If so, it’s time to upgrade your bandwidth. However, congestion often is due to non-business related data such as YouTube/Netflix traffic or malicious behavior such as a DDoS. To find out, you need to be able to see further into the data and categorize it into groups. NetFlow analyzer tools can determine the nature of the data flow being sent across a network. They can even determine whether the traffic is business critical.
For those tricky situations where all other troubleshooting tools have failed to find the root cause of network latency, it’s time to boot up your trusty protocol analyzer to dig into each packet. A protocol analyzer such as Wireshark can perform deep-packet inspection to determine exactly which packets are slow, where they are going and what they are doing.
The protocol analyzer is becoming increasingly useful for tracking down latency since most modern applications now operate in distributed architectures. If a user is complaining about an application being slow, that application may be distributed across multiple servers. The protocol analyzer can determine whether the problem is limited to a specific part of the overall distributed architecture.
Application performance monitoring
Finally, if you feel that you’ve exhausted all available resources and still haven’t found the root source of the problem, then perhaps the latency isn’t occurring on the network. Instead, the problem may very well be within the application itself. For example, if application transactions are collected and sent to a distributed database, it could be that the database is not fast enough to keep up during peak times. Application performance monitoring tools enable you drill into the application to uncover latency issues.