Why Spanning Tree Is Evil

The Spanning Tree Protocol (STP) is widely used for network redundancy and resilience in Ethernet networks, in spite of its well-known flaws and limitations. Most engineers who want to reduce or avoid Spanning Tree in their networks use Multichassis Link Aggregation (MLAG) or Transparent Interconnection of Lots of Links (TRILL), but here I will explain why the common avoidance tactics are risky, as well.

This article is based on a session that I presented at the Interop conference this year. I will present the information again in Building the Physical Network for the Software-Defined Data Center on Sept. 29, 2104, at Interop New York. Register now for Interop, Sept. 29 to Oct. 3, in New York City.

Brittle failure mode
The Spanning Tree Protocol actually works quite well. But when it doesn't, the entire failure domain collapses. The way to reduce the failure domain is to use routing, but this causes application problems. This brittle failure mode for the minimum failure condition is the major problem with STP. It fails disgracefully; this makes us perceive STP as unreliable.

Wasted bandwidth
In a Spanning Tree network, half the network bandwidth is shut down or blocking. So 40% of network cost (unused ports and larger switches) is simply wasted and doing nothing. Multiple-Instance Spanning Tree (MSTP) was developed to waste less bandwidth. But still, for any given VLAN, less than half the bandwidth is available. The image below shows that half of all links are shut down by the STP loop-detection process.

Oversized switches
In a tree-based network design, the core switches must be oversized and scaled vertically. Large switches are expensive to manufacture and complex to develop, and they incur high production costs. A second cost factor is derived from the critical nature of core switches on which the entire "tree" depends. The tree network architecture means that customers are expending large amounts of money for zero business value.

Core switches are required to support the access switches that connect servers, which generate business value. The basic fact that network engineers use hierarchical tree designs that force critical hardware choices on that pair of core devices is all kinds of stupid. Doubling down on highly complex devices that must be highly reliable means that vendors can charge enormous prices and make 70% profit margins yet deliver little value back to the business, except to scale the access switch layer.

This reason alone is the biggest business proposition for replacing Spanning Tree in your network. Using ECMP network designs does not force the use of chassis-based switches to scale port density, and this represents a large cost savings on hardware purchases.

Protocol hacks
There have been dozens of attempts to improve STP, but they have resulted in outsized technical debt and massive operational costs. Here are a few.

  • Loop guard
  • Root guard
  • BPDU guard
  • UDLD
  • Portfast
  • Root placement
  • Odd/even VLAN weighting on uplinks
  • Etc., etc., etc., sigh

The dollar cost to the business in terms of network design time, constant operation review and audit, operational compliance, and training is enormous.

MLAG hacks
The industry tried to replace STP with a better protocol in TRILL, but vendors made it prohibitive to buy and deploy. Most customers decided to reduce the use of STP with the widespread use of MLAG designs, which also incur a significant amount of technical debt. These are more often known as "fat tree" network designs, like the following.

The technical debts incurred with MLAG are:

  • Bonding the control planes of two or more switches in a highly complex software function that is vulnerable to bugs, poor coding, and design deficiencies. Cisco vPC, VSS, and HP IRF have a particularly bad reputation for weird bugs and operational problems, according to my experiences and emails I have received.
  • Bonding control planes is naturally proprietary, because of hardware and software dependencies.
  • Configuration and implementation of MLAG on a day-to-day basis is more complex than STP itself (note issues like peer detection, orphan ports, and MLAG primary/secondary).
  • Scalability is poor; typically only two switches can be bonded, though eight is possible with a number of limitations.
  • Code dependencies prevent migration and updates. Because of the tight software and hardware integration, upgrades are almost impossible and nearly always require outages, because both operating systems must upgraded at the same time for risk-free operations.

The future of Spanning Tree
Spanning Tree is not inherently bad or wrong, but it does have many limitations in its design and operation. The most serious shortcoming is that STP has a brittle failure mode that can bring down entire data center or campus networks when something goes wrong. Though modifications and enhancements have addressed some of these risks, this has happened at the cost of technical debt in design and maintenance.

It remains risky to create VLANs and modify the STP configuration. ITIL change rules mean that this everyday activity is regarded as high risk. Sadly, Spanning Tree will be around for decades to come while customers move to MLAG or even TRILL to reduce STP, instead replacing it completely with ECMP networking architectures.

Join me at my technical workshop at Interop New York 2014, where I will dive into how ECMP works best.