How-To: Plan an iSCSI SAN
February 10, 2006
Even with the right switches, be aware that if your servers have single-gigabit connections to an Ethernet switch for access to your disk arrays, they are vulnerable to failure on that link. All it takes is someone slamming a rack door on the cable, zapping the switch or NIC port with static, or gremlins (aka unknown causes) killing the link. Your server will lose access to its data and crash your application, which probably will corrupt your data.
Since the days of Kalpana 10-Mbps Ethernet switches, server administrators have used NIC teaming to eliminate Ethernet links as a single point of failure, as well as to boost bandwidth from the server to the network. Most server NIC drivers do this by load-balancing the traffic across two or more links to the same switch. This may protect your systems from port and link failures, but still leave them vulnerable to a switch failure.
A single point-of-failure SAN switch that dies is the data center equivalent of a neutron bomb going off. Your equipment may still be there, but your servers are all offline, your data is unavailable and possibly corrupt, and your job is on the line. Much as we'd like to believe Ethernet switch vendors when they claim 50,000-hour MTBFs (mean time between failures), we all know switches fail and usually at the worst possible time.
The best solution is to use Multipath I/O (MPIO), a Layer 3 technique that lets you create multiple connections from each server's iSCSI initiator to your disk array and specify the Ethernet connection and path the data should take. Unlike the Layer 2 NIC teaming technique that requires all your connections reside in the same broadcast domain, MPIO gives each Ethernet card in your server its own IP address, and you can set paths in different subnets.
MPIO is supported by most iSCSI initiators at no cost, but some enterprise disk vendors, such as EMC and Network Appliance, make you pay for an additional MPIO driver for each server using MPIO on their arrays.