As networks continue to expand and new segments are added, the complexity associated with switching and routing is also growing. These infrastructure changes often impact the performance of a network, meaning it can take longer for packets to traverse the network. It can introduce a variety of visibility and performance challenges such as packet loss or high latency – especially when there are speed differences at aggregation points. And different networking devices may have varying levels of telemetry. For example, some may not export flow data, and others may provide high-level conversational information but lack performance or deep packet inspection application visibility, making it harder for NetOps teams to find problems and mitigate them quickly.
In fact, according to a recent survey, 19% of teams listed "improving network monitoring across the entire end-to-end network” as a key priority for 2022. But there’s good news associated with this growth too. Every time a new switch or router is introduced, it presents an opportunity to see into the flows and packets. This allows NetOps teams to identify network performance issues hop-by-hop. In this article, I’d like to explore how that’s done and why it’s important.
First, let’s start with a brief history of network devices. A few years ago, the function of network devices like switches, routers, and firewalls was well-defined. Each device was locally configured for specific needs. However, these devices have come a long way as more and more are software-defined. This allows network administrators to define policies in the SDN controllers to decide how, when, why, and where to route packets. Today, the network can look a lot different from traditional environments. But as mentioned earlier, this presents an opportunity.
Every router or switch on the network can be used to identify network issues hop-by-hop. For example, a team can isolate poor application performance, validate SLAs, track QoS, optimize SD-WAN load balancing, identify on-premises to cloud traffic, and much more. A hop is the act of a packet passing through a router or switch (or any networking device), and the number of hops is the number of routers and switches that the packet passed through along the way to its destination. The router and switches can send packets on different links depending on QoS settings or routing configuration, which can either increase or decrease the performance of the network for applications or groups of people. How do you see these hops?
The most basic way to see hop-by-hop is from a command line using TRACEROUTE on an IP address or hostname. It shows the path through the routers and switches as each hop that the packets will take along the way and how long each hop took. This is a useful network troubleshooting tool, allowing the NetOps team to understand the hop-by-hop behavior of the network. However, there are severe limitations. First, it’s not the real traffic (it’s generated for testing), so the TRACEROUTE packets might not go where the real application packets go or be prioritized in the same way, causing the results to potentially be different. Also, for security reasons, TRACEROUTE is not always enabled on switches and routers or allowed through firewalls.
A visibility alternative: hop-by-hop analysis
Another way to identify network issues hop-by-hop is using NetFlow and IPFIX, which is available on most routers and switches today. NetFlow is a Cisco-defined protocol that provides details about the flows of packets through a router or switch. IPFIX is a public IANA standard version of NetFlow. There are many network monitoring solutions available that collect NetFlow and IPFIX from these routers and switches and visually display the hop-by-hop paths that data packets take across the network. For example, a flow path might originate in a remote site, where a network performance monitoring and diagnostic (NPMD) platform can gather telemetry from a wireless LAN controller, multiple switches, a firewall, and a router. The traffic may then be routed across one or more service provider transports into another site or the public cloud, each with different hops. The network monitoring solution will gather telemetry across all the hops, end to end.
For those segments of the network that do not have NetFlow and/or IPFIX capabilities, organizations can tap into the network or provide the packets through a span port or an inline tap that can generate the IPFIX and send it to a collector. These advanced packet capture and IPFIX analysis solutions can also extend IPFIX with quality metrics that most routers and switches don’t provide. For example, when troubleshooting a voice issue, network engineers can gather performance metrics like packet loss, jitter, MOS scores, codec, and even phone numbers. These performance metrics provide extremely valuable information to augment basic flow data like IP addresses and port numbers.
When digging into a cloud or web application issue, advanced packet capture and IPFIX analysis solutions can provide network performance data like network delay, application delay, retransmission counts, HTTP hostnames like “www.salesforce.com," and web page response times. These metrics help to remove the finger-pointing that may often occur between application server teams and network teams to reduce the mean time to resolution.
What should NetOps teams think about when looking to find the perfect solution for hop-by-hop network analysis? Here are some key questions to consider:
- Can the solution scale with large enterprise requirements with hundreds or thousands of sites and devices?
- What flow rates are supported? For example, some data center routers can generate over 100,000 flows per second, per device. This is massive. Can the solution collect the data AND also report on it in a meaningful and timely way?
- Is the solution multivendor, multidomain, and multi-telemetry in nature to report on a wide variety of network deployments across the LAN, WAN, data center, and cloud?
- Does the solution support standards-based IPFIX and vendor-specific flow records to provide detailed monitoring and analysis across the vendor landscape?
- When network devices are not able to export IPFIX or lack performance metrics, does the monitoring platform support packet capture and flow generation to fill the gaps?
As networks continue to evolve, NetOps teams are tasked with tracking and solving performance issues across the network. As a result, hop-by-hop analysis plays a critical role in this process. And when seen as an opportunity – and not a roadblock – NetOps can increase visibility and deliver the high-performance network users expect.
David Izumo is principal engineer at LiveAction.