Do you need multi-path Ethernet in the data center using TRILL, Shortest Path Bridging (SPB), Cisco's FabricPath or Brocade's VCS in order to maximize network efficiency and reduce congestion? Probably not, unless you manage a very big data center with servers and access ports running into the tens of thousands. If you have less than 5,000 access ports in your data center, you can probably avoid routing Ethernet frames and use your vendor's multi-chassis link aggregation (MLAG) product-set to interconnect the access in an active/active design. MLAG works
Nearly every story, blog, or article about FCoE says that you have to be running TRILL, SPB, or something like it--yes, I have made that claim as well-- so that you can have an efficient mesh network where traffic flows--a unidirectional stream of packets--follow the same path through the network and arrive at the destination in the correct order. Unlike other protocols Fibre Channel--aka SCSI in a frame--doesn't like data arriving out of order.
By using shortest path, we can ensure that the latency is reduced, since fewer hops mean less delay and if there is a change in the network topology, we can find the next best path. That kind of design assumes you have a mesh or somewhat meshed network where there are multiple paths through the network and no way to manage the flows. Lower level protocols, the data center bridging protocols, will handle the lossless networking.
All network vendors today support multi-chassis link aggregation, which is based on the IEEE 802.3AX-2008 (the original LAG standard was 802.3ad) Link Aggregation (LAG) standard. LAG allows you to bond two or more physical links into a logical link between two switches or between a server and a switch. Since LAG introduces a loop in the network, Spanning Tree has to be disabled on the LAG ports. LAG doesn't double the capacity, which Ethan Banks points out in The Scaling Limitations of Etherchannel -Or- Why 1+1 Does Not Equal 2 because LAG implementations place traffic on links based on flows, not packets. This is done so that packet order is preserved end to end and to remove the possibility of packet duplication--two design requirements of 802.1AX-2008. The algorithm that determines which flows go to a specific link should be designed well enough to evenly distribute the load across all available links, but there is no way to predetermine how long a flow will last or how big the frames will be.
Standard LAG is only between two peers. Multi-Chassis LAG is proprietary. Ivan Pepelnjak has a short run down of MLAG and fabric features from Brocade, Cisco, HP, and Juniper. Suffice it to say that the MLAG features look nearly the same, but switches from multiple vendors won't interoperate with MLAG. What is interesting is that MLAG systems all share a common trait--no more than two core switches.
I recall having a discussion with Juniper several months ago about MLAG and why only two core switches. They indicated they were working on a set of features that would have more than two core switches. 20/20 hindsight being what it is, it seems that they were floating what is now known as QFabric, which seems to be dual core only but that might change by the time the interconnect ships in Q3. As an aside, I was talking to a network equipment engineer shortly thereafter (I honestly forget who) about MLAG and why only dual cores and mentioned another vendor was claiming more than two core switches. His response was something like "Good luck with that. Maintaining a coherent state between two devices in sub-microsecond time between more than two nodes is extremely hard to do well."
MLAG inherits many of the same traits of LAG in that all frames in a flow are sent over the same physical link; some algorithm determines how flows get sent to particular links, frame order is maintained and duplication is avoided. The number of switches (I am avoiding the politically hot term "hops") that are traversed between two devices, like a server and NAS, remain the same so delay should be equivalent regardless of the path taken.
The point at which you have to look at TRILL or SPB or some other method to route Ethernet (sorry, I don't know how else to describe it) is:
- When the number of access port needs exceeds your vendors MLAG capacity and you need to add an additional core switch that doesn't not participate in the MLAG
- When you want to run a multi-vendor network with one vendor's switches at the access layer and anothers in the core
- When you want to run different switch product lines from a single vendor, where one product like Brocades 8000, Cisco's Catalyst, or Junipers EX switches can't participate in the fabric, or
- When you want a mesh network.