"Evolution is Inevitable." It is a basis of the present and obviously will be for the coming future, so let us talk a bit about network telemetry evolution against the evolving world of SDN!
With the increasing demands, enterprises and data centers are looking to new technologies such as SDN and NFV to make the existing networks more reliable, redundant, effective, and profitable. Such adoptions demand more sophisticated and flexible telemetry mechanisms for monitoring, maintaining, and troubleshooting the networks, particularly for data centers and large enterprise networks. Obviously, legacy monitoring approaches serve an organization well for gathering many insightful data, but a few mandatory network monitoring needs of growing networks won’t suffice with those traditional approaches.
Requirements of growing networks are rapidly changing. They include:
Gathering real-time data: In traditional methods, data usually gets polled every few seconds from a networking device, whereas, the traffic runs at the rate of GBs or MBs from that device. Just imagine, how much real-timeliness can be achieved here!
Having 360⁰ visibility of the network: Every network device communicates explicitly to a monitoring manager, and hence to identify the flow taken by a packet in the network would be difficult. It also will be restrictive, not allowing an organization to gather many insights into actual network behavior.
Looking up towards machine learning: ML is also stepping up in the networking world. However, the real benefits of using ML can only be achieved if instantaneous exceptions can be identified and quickly acted upon to have reliability and stability. Legacy methods won’t have much scope in this direction.
Delivering high flexibility: Networking resources were a little rigid in the past offering little opportunity to enhance traditional methods. Whereas, in current trends, networking is more focused on programmability and redundancy for both hardware and software, which would naturally bring better strategies related to monitoring.
How packet-level network telemetry can help
As the name indicates, it is all about carrying and gathering the telemetry information from the data packets traversing through the network. This tactic is being utilized within in-situ OAM (IOAM) and In-band Network Telemetry (INT), as well as in an alternate marking performance measurement (AM-PM) context. INT has become popular as the telemetry data are being piggybacked at the line-rate along with the usual traffic of the network.
As illustrated in the diagram above, a source end-point embeds instruction (INT Header) in packets listing the types of network state to be collected from the network elements. After that, each network element inserts the requested network state (INT Payload) in the packet as it traverses through the data path of the network. When the packets reach the last node, all the cascaded network states would be decoded and analyzed to get the necessary insights into the entire flow of the packet traversal. End-users can configure source and sink endpoints, flows of the network, and more, from which better and insightful data can be gathered from the network. Let us look at the obvious possibilities and benefits of packet monitoring brought up with this approach.
Inflated latencies and congestion analysis: While traversing through the source to sink, nodes in the network would append the timestamp marking the time the packet has ingressed and egressed. By decoding this data, the latency within a node and between two nodes can be easily identified.
Once latencies over the flow taken in the network are calibrated with low traffic, congestion can be easily identifiable as the latencies would be relatively high when the traffic is high.
Network topology and packet traversals: If every node is instructed to append its identity (i.e., a port on which the packet is ingressed and a port on which the packet is egressed), the topology can be easily derived illustrating the path taken by the packet. By configuring multiple source and sink nodes with multiple packet traversals can lead to capture 360⁰ visibility of the network topology.
Timeliness and flexibility for exceptions: The data are being captured from the network element with traffic on line-rate, which is obviously the fastest way to identify the crucial exceptions.
In the upcoming trends, network processor ASICs are also coming up with the support of generating and mirroring a packet on exception having insightful data. This is an altogether different taste of packet-based telemetry approach but offers best-in-class flexibility and timeliness.
Doorway to machine learning: The first need for ML is to learn about the possible behavior at the earliest, and this approach fulfills the same by providing very real-time notification of the events.
As mentioned in the first benefit, latencies can help identify the network congestion points. Similarly, ML algorithms can be established to help predict the congestive conditions and aid in resolving them. This is a single use case where ML can help in protecting important or elephant flows.
The way networking industry is evolving, ML will soon be used to open new doors for security, redundancy, and reliability within the network!
Thus, packet level network telemetry enables an organization to gather data along with the actual traffic. This gives organizations the ability to observe real-time, end-to-end network behavior across a network infrastructure. This would help network administrators describe transient issues that arise due to performance bottlenecks, network failures, or configuration errors.
About the author: Aalok Shah is associated with VOLANSYS Technologies as a Sr. Engineer. He has served in multiple industry verticals and worked on many tools and technologies during his career. Being passionate, he always looks forward to the opportunities to bring better solutions to the table.