Troubleshooting High CPU Usage On Cisco Nexus 7000

Peter Welcher describes toubleshooting a recent issue on core Nexus 7K data center switches.

November 15, 2016

2 Min Read

Something a bit different came up in troubleshooting recently, something I’d like to share briefly in case it bites you.

Symptom: High CPU on core Nexus 7K in datacenter, leading to UDLD and VPC failure and some short-lived STP oddness. Nothing really showed up as hitting against the COPP (Control Plane Policing) policy. The site was running Strict level COPP, albeit from a prior NX-OS version. The newer Strict COPP template policy hadn’t been applied, but isn’t much different. Intermittent CPU spikes visible in “show process CPU history” command, which appear to have been normal “platform” process idle time processing.

More data: Syslog showed some interesting messages. It showed a lot of Glean syslog messages like the following (carefully edited to hide identity):

%ADJMGR-3-IP_GLEAN_EXCESS_TRAFFIC:  adjmgr [6965]  Excess traffic (10048 packets / 14709540 octets) were dropped in 00:04:48 seconds for ungleaned prefix .../32 on interface ... in vrf default

Cause: After tracking down the destination host (/32 in the syslog message), looking at the sources from NetFlow data, Googling, etc., someone realized that the destination was a NetFlow collector that had been recently decommissioned.

Resolution: Taking out/updating the NetFlow export commands on about 30 routers stopped the above syslog messages, and appears to have lightened the CPU load somewhat. Looked at in a different way, the Glean handling may indeed have been doing its job, protecting the CPU.

Explanation: When a packet has to be forwarded to a destination, the last hop router is responsible for ARP lookup of the destination MAC address. When hardware forwarding doesn’t find an adjacency for a known ARP entry, a Glean occurs so the supervisor can send an ARP to get the missing info. The supervisor creates a negative ARP entry until an ARP reply is received or a timeout period has passed. That saves pointless further ARP queries when one is already in progress, protecting the supervisor from a high load.

Note that Gleans occur in response to a normal packet being received with normal destination on a local subnet and requiring ARP, thereby triggering CPU activity. COPP only rate limits traffic going to the router control plane. The ARP table misses may not be handled by COPP, although apparently the Nexus 7K has hardware Glean rate limited.

Gratuitous Advice: When decommissioning a NetFlow collector, particularly one with lots of source routers feeding it, remove the relevant NetFlow (or IPFIX) export commands first. That way, you avoid what might be described as a “self-inflicted DDoS attack.”

I do have to note that the transient VPC/STP issue root cause is still unclear. It was a one-time transient condition. Some spike in NetFlow traffic might possibly have bumped up the Gleans to where they were a problem. Since the transient, we haven’t seen anything peg the CPU nor cause other problems. So there might well have been another cause for the VPC problem, or another cause compounded by the Glean situation.

This article originally appeared on the NetCraftsmen blog.

About the Author

Peter Welcher

Architect & Infrastructure Practice Lead, NetCraftsmen

Peter Welcher is an architect and the infrastructure practice lead at NetCraftsmen. A principal consultant with broad knowledge and experience in high-end routing and network design, as well as data centers, Pete has provided design advice and done assessments of a wide variety of networks. Pete's certifications include CCIE #1773, CCDP, and CCSI #94014.

See more from Peter Welcher

Related Topics

Recent in Infrastructure

Related Topics

Recent in Network Mgmt

Related Topics

Recent in Security

Related Topics

Recent in Enterprise Connectivity

Related Topics

Recent in Wireless

Related Topics

Recent in Careers

Related Topics

Troubleshooting High CPU Usage On Cisco Nexus 7000

About the Author

Editor's Choice

Related Topics

Recent in Infrastructure

Related Topics

Recent in Network Mgmt

Related Topics

Recent in Security

Related Topics

Recent in Enterprise Connectivity

Related Topics

Recent in Wireless

Related Topics

Recent in Careers

Related Topics

<span class="ArticleBase-LargeTitle">Troubleshooting High CPU Usage On Cisco Nexus 7000</span>Troubleshooting High CPU Usage On Cisco Nexus 7000

About the Author

Editor's Choice

Troubleshooting High CPU Usage On Cisco Nexus 7000