You finally got funding for your iSCSI SAN project. You convinced management that buying a new SCSI storage array with each new server cluster or purchasing one each time a server outgrew its internal storage cluttered up the data center and ate up the budget. Then you picked out the iSCSI arrays of your dreams and promised the powers that be that Ethernet is all you'd need to hook it all together.
How To Plan an iSCSI SAN
Click on the images below to launch video screencast presentations and follow along as we configure a Windows server to use Jumbo frames and multi-path I/O.
Configure Your Network
Connect Your Server
Now comes the rude awakening. Just because you can run iSCSI across your existing network with your other traffic doesn't mean you should. Typical network applications are designed with the possibility of a network failure in mind, but operating systems expect their disk drives to be available all the time. An infected laptop that overloads your network for a few minutes could make servers that can't access their disks unhappy. So rule No. 1 in planning your iSCSI SAN is to place iSCSI traffic on its own VLAN and preferably on a completely separate, gigabit-speed network.
I've used a consumer-grade Gigabit Ethernet switch in demonstrations of iSCSI technology and run iSCSI on a 10-Mbps Ethernet hub just for grins. But don't try this at the office or at home. Consumer switches typically don't support wire-speed connections between multiple ports, so they may drop packets without warning.
We've seen a low-end, 24-port switch with two, 12-port switch engines and a single-gigabit connection between them. But if you put your servers on ports 1 through 16 and place your disk arrays on ports 18 through 24, that single-gigabit connection will get overloaded, resulting in packet loss and a huge performance hit (or worse).
Your iSCSI SAN should use an enterprise-class, nonblocking, Gigabit Ethernet switch, such as one from Extreme Networks or Foundry Networks. And given the homicidal thoughts that run through my head when my BlackBerry buzzes at 4 a.m., spring for the dual, redundant power supplies, too.
Even with the right switches, be aware that if your servers have single-gigabit connections to an Ethernet switch for access to your disk arrays, they are vulnerable to failure on that link. All it takes is someone slamming a rack door on the cable, zapping the switch or NIC port with static, or gremlins (aka unknown causes) killing the link. Your server will lose access to its data and crash your application, which probably will corrupt your data.
Since the days of Kalpana 10-Mbps Ethernet switches, server administrators have used NIC teaming to eliminate Ethernet links as a single point of failure, as well as to boost bandwidth from the server to the network. Most server NIC drivers do this by load-balancing the traffic across two or more links to the same switch. This may protect your systems from port and link failures, but still leave them vulnerable to a switch failure.
A single point-of-failure SAN switch that dies is the data center equivalent of a neutron bomb going off. Your equipment may still be there, but your servers are all offline, your data is unavailable and possibly corrupt, and your job is on the line. Much as we'd like to believe Ethernet switch vendors when they claim 50,000-hour MTBFs (mean time between failures), we all know switches fail and usually at the worst possible time.
The best solution is to use Multipath I/O (MPIO), a Layer 3 technique that lets you create multiple connections from each server's iSCSI initiator to your disk array and specify the Ethernet connection and path the data should take. Unlike the Layer 2 NIC teaming technique that requires all your connections reside in the same broadcast domain, MPIO gives each Ethernet card in your server its own IP address, and you can set paths in different subnets.
MPIO is supported by most iSCSI initiators at no cost, but some enterprise disk vendors, such as EMC and Network Appliance, make you pay for an additional MPIO driver for each server using MPIO on their arrays.
You need multiple switches for truly critical applications, and you must connect each server and disk array to at least two switches. (While you're at it, make sure your interswitch links have enough bandwidth to carry all the traffic from your servers to disk arrays in the event of a disk array-to-switch link failure, too).
Click to enlarge in another window
ISCSI vendors and various storage pundits make a big deal over the need for using jumbo frames in an iSCSI SAN. Back in the dark ages when Ethernet was a half-duplex shared media network, the maximum frame size of 1,500 bytes ensured no one station could monopolize the network. Since most host operating systems read and write data from their disks in clusters of 4 KB (the NTFS default) or larger if your system is using standard 1,500-byte frames, most iSCSI data transfers require multiple frames. Multiple frames means TCP/IP stack overhead in the CPU because the data is divided into multiple packets, checksums are calculated for each, and packets must be reassembled at the far end. Small packets also soak up network bandwidth because more time is spent in interframe gaps, frame headers and checksums relative to real data.
The good news is most enterprise Gigabit Ethernet equipment supports jumbo frames to some extent. We've found that enabling jumbo frames can speed up iSCSI performance by about 5 percent, while reducing server CPU utilization by 2 percent to 3 percent with standard or smarter NICs. Because TOE (TCP off-load engine) cards or HBAs (host bus adapters) already do off-loading, the CPU savings from jumbo frames is a wash when the frames are used with a TOE or HBA, though the jumbo frames should still speed up performance.
Click to enlarge in another window
With the jumbo frames, ensure all the devices on your iSCSI network--including switches, initiators and targets--are configured to use the same maximum frame size. There is no standard, maximum jumbo-frame size, and we've seen equipment supporting frame sizes from 9,000 bytes to 16 KB.
If your servers or disk arrays are set to a larger maximum frame size than your switches, your iSCSI system will appear to be working perfectly until you start doing large data transfers that exceed the switch's maximum frame--then disk I/O errors will start cropping up.
Stick Your TOE in the Water
Although most current operating systems support iSCSI software initiators that let you use any Ethernet card to connect your servers to an iSCSI disk array, don't just use any old Gigabit Ethernet card for your iSCSI connection. Ethernet cards designed for workstations, for example, use a 32-bit PCI bus with just more than a gigabit of bandwidth that gets shared with other devices on the bus. A server Ethernet card uses the much faster PCI-X or PCI-Express bus and performs onboard TCP/IP checksum/off-load, which reduces the CPU's iSCSI traffic processing. Broadcom's NetXtreme controller chips, which come with most server motherboards, also do checksum/off-load.
Make sure to use the latest manufacturer-specific drivers for whichever NIC you choose. The generic drivers that come with Windows usually don't support advanced features like jumbo frames and TCP checksum/off-load.
Click to enlarge in another window
A TOE card from Alacritech, Chelsio and others goes a step further. Its on-card processors perform TCP segmentation and reassembly, as well as checksum calculations. Plus, TOEs can accelerate any type of TCP traffic and work with the same software initiators as other Ethernet cards. An iSCSI HBA, such as those from QLogic and Adaptec, off-loads not only the TCP management but also the higher-level iSCSI protocol. It looks like a disk controller to the host operating system rather than an Ethernet card.
Although TOEs and iSCSI HBAs can save your server a few CPU cycles--up to 10 percent or 15 percent running common applications like SQL Server--our experience is that they don't live up to their vendors' promises of faster disk I/O. And most midrange servers aren't CPU-bound, so we only recommend TOEs and HBAs for those rare servers that are.
The big advantage of an iSCSI HBA is it makes booting from the iSCSI SAN easy. Because HBAs act like disk controllers (complete with INT13 BIOS support), you can put your system drive on an iSCSI target. Booting from the SAN makes creating multiple similar servers easy: Just copy the boot volume to create a tenth Web or terminal server, for example. And it's easy to replace a failed server by attaching its volumes to a spare blade or 1U server of the same model.
EmBoot's netboot/i uses the PXE (Pre-Execution Environment) and a TFTP server so your servers with standard Ethernet cards can boot from an iSCSI SAN. But it requires a bit more finagling to create a system volume on a local drive and copy it to the SAN for boot. We're expecting server vendors to build this kind of functionality into the next generation of servers, so stay tuned.
Each server attached to the iSCSI SAN manages file systems on the logical drives it accesses. That means you have to control access to iSCSI volumes, so multiple servers don't think they "own" the same logical drive and overwrite one another's data. With the exception of server clusters (which arbitrate access to the disk among themselves) and specialized SAN file systems, this means one server per logical drive.
ISCSI targets typically let you control access through an IP address, which is fine in a closed environment but opens the door to a rogue administrator connecting an unauthorized server and accessing your company's crown jewels. Alternatively, you could use your initiator's IQN (iSCSI qualified name). Replacing failed servers in larger environments is easier with IQNs because they are more easily changed on a running replacement server than an IP address.
If admins will be managing servers that won't stick to the resources they've been allocated, you can configure your targets to use CHAP (Challenge Handshake Authentication Protocol) to password-protect your sensitive volumes. The iSCSI spec defines how iSCSI devices can use IPsec to both control access to resources and encrypt server-to-disk target traffic across the network. But this feature is rarely used because of the CPU overhead and latency it would create. In fact, the only iSCSI disk targets we've seen that support IPsec are software targets like Stringbean Software's WinTarget that run under Windows and use the native Windows IPsec implementation. So SAN encryption just isn't ready for prime time.
Although putting together a production-ready iSCSI SAN isn't as easy as you'd think, it's not rocket science. With proper planning and a pair of good switches, you can build an enterprise-class iSCSI SAN.
Howard Marks is founder and chief scientist at Networks Are Our Lives, a network design and consulting firm in Hoboken, N.J. Write to him at firstname.lastname@example.org.
» Direct-Attached Storage. You could keep adding more drives to each server that needs more storage, but you'll waste a lot of disk space allocating whole RAID arrays to servers. Plus, you won't be able to build clusters with more than two members and it'll run slower, too.
» Fibre Channel. Conventional Fibre Channel SAN technology has a few advantages over iSCSI, including speed (current Fibre Channel gear runs at 2 Gbps), low latency and a deterministic access protocol that essentially eliminates packet loss from bandwidth overruns. But Fibre Channel is expensive--switches and HBAs typically cost up to five times more than iSCSI alternatives. Add in Fibre Channel's steep learning curve and historical lack of full interoperability, and it's probably best left to the old hands.