Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Distributed Storage: Extending the SAN Over the WAN

The three basic SAN-over-WAN protocols in widespread use on IP networks are based on the SCSI and Fibre Channel SAN protocols and depend on TCP:

• iSCSI (Internet SCSI) encapsulates the standard SCSI-3 device interface commands in a TCP/IP data flow for transport over the WAN.

• FCIP (Fibre Channel over TCP/IP) encapsulates standard Fibre Channel packets in a TCP/IP data flow. Alternatives can encapsulate Fibre Channel over ATM or other link transports.

• iFCP (Internet Fibre Channel Protocol) intercepts Fibre Channel packets and spoofs them, replacing them with TCP sessions for transport across IP.

Unfortunately, SCSI and FC were designed for short interconnections between processors and storage devices that are usually in the same room, if not the same cabinet. They expect virtually zero errors and latency. FCIP, iFCP and iSCSI all have problems on WANs.

WAN Performance Problems

WAN connections introduce high error rates and latency. A WAN link's error rate, for example, may be up to 0.5 percent, and the latency over an optical fiber is 5 microseconds per kilometer. A direct link from New York to Los Angeles introduces a 20-millisecond delay; a hop through a geosynchronous satellite introduces 260 ms.

These difficulties arise because the TCP and SAN-over-WAN protocols force the sender to pause if it has transmitted its permitted quota, or window, of data packets without receiving a positive acknowledgment. A system of explicit acknowledgments is necessary to ensure error-free communications and flow control, but it can restrict WAN bandwidth. "Waiting for Acknowledgment," below, illustrates such a situation, with a file transmission through a 45-Mbps satellite link. The one-way latency for a geosynchronous satellite is 0.26 seconds, and the receiving TCP software uses a default receive window of 17,520 bytes.

It takes 0.26 seconds for the brief data burst containing 17,520 bytes to travel from the transmitter to the receiver in our example. It takes 0.26 seconds for the acknowledgments to travel from the receiver back to the transmitter so the transmission cycle can resume. In approximately half a second, only 17,520 bytes have been transferred, for an effective throughput of just 35 KB per second on the 45-Mbps link!

This problem exists with any long, fat network with high bandwidth and high latency. The one-way latency on a direct terrestrial link between New York and Melbourne, Australia, is 0.1 seconds, for example.

The situation degrades further if a packet is lost or garbled. TCP assumes that packet loss always indicates congestion. It therefore cuts the transmission rate drastically (at least by half) and slowly builds that rate back to what it was before the error occurred. The rate of buildup is proportional to the round-trip latency on the link: The longer the latency, the more time it takes to return to the original transmission rate. Multiple in-flight losses can have almost catastrophic effects; transmission may stall temporarily. Data isn't lost, but throughput suffers, and for remote synchronous storage backup, the app halts while waiting for the acknowledgment from the target.

Tweaking SAN-over-WAN implementations can make the difference between a failed installation and a resounding success. Tuning techniques fall into three key areas: compressing WAN traffic; tuning the WAN protocol; and, if database replication is involved, choosing the appropriate replication technique.

• Compressing WAN Traffic

Many links carry repetitive traffic in the form of repeated strings. By squeezing out repeated sequences, compression shortens the data flow and reduces the number of packets traversing the WAN, thereby speeding up the return of acknowledgments.

Most SAN-over-WAN systems include basic compression and try to avoid transmitting data blocks unnecessarily. However, the systems often can't detect duplicate sequences in data blocks with different addresses, and the compression techniques usually can't match those of specialty appliances.

The most advanced products have massive compression dictionaries that replace megabytes of data with a single token, saving bandwidth. Relying on performance-optimization appliances at both ends of the WAN's communications path, these advanced techniques require no changes to the SAN-over-WAN devices. Vendors such as Expand Networks say SAN over WAN is well-suited to advanced compression. The company says when its WAN Accelerators are installed, bandwidth needs typically drop by 90 percent.

• Tuning the WAN Protocol

The first place to tune the WAN protocol is within the SAN-over-WAN devices and their networks. By increasing the TCP window as the buffers allow and enabling TCP's selective-acknowledgment option at both ends of the path, you can reduce flow-control problems and delays from packet loss.

Many networks let you increase the basic frame and packet sizes to reduce CPU-processing overhead and the number of acknowledgments needed. However, overly increasing the TCP packet size will force intermediate network devices to fragment packets that travel across a link with a restricted frame size. Lost fragments waste time and raise reassembly overhead.

Finally, moving the TCP processes, along with any encryption processes, to an external card (an off-load engine) can improve both protocol handling and the performance of the SAN system.

If optimizing the SAN-over-WAN devices and the network doesn't improve performance sufficiently, WAN performance-optimization appliances can help. These devices let you use the SAN-over-WAN connections' full bandwidth by requiring fewer round-trips and fixing problems involving TCP window sizes and error recovery.

Expand Networks, Orbital Data Corp., Peribit Networks and others sell bandwidth-optimization appliances that provide high data flows even in the presence of many errors on high-latency, high-bandwidth paths. Working in pairs, with a unit at each end of the path, the appliances usually perform advanced compression. They use standard TCP to communicate with the SAN-over-WAN devices and highly optimized protocols to communicate with each other over the WAN. (For a review of four such products, see "WAN Accelerators: Breaking the WAN Bottleneck,".)

Brocade Communications Systems, Cisco Systems and McData offer switches and routers that optimize both the SAN and the TCP protocols, though their compression and TCP-optimization capabilities may not be as sophisticated as those of the bandwidth-optimization appliances.

• Synchronous vs. Asynchronous

Database replication brings its own challenges to SAN-over-WAN installations. Choosing the right technique--synchronous or asynchronous--is critical.

Synchronous replication in a SAN assumes both the local and remote storage locations receive the data simultaneously. Data travels to the local and remote storage locations, and the replication app returns an acknowledgment to the requesting program only when the data has been saved at both locations.

As with most data-transfer mechanisms, increasing WAN latency adversely affects synchronous SAN replication. Even without packet loss and the accompanying need for retransmissions, flow-control problems resulting from the growing latency between sites will limit total WAN throughput. If you don't compensate for latency, any synchronous replication of SAN data beyond 100 kilometers will experience enough delays in data acknowledgments to affect the application's performance. Data compression and WAN protocol tuning can permit distances of hundreds of kilometers before delays become problematic.

Asynchronous data replication, a widely available FC and iSCSI option, greatly reduces latency. The SAN returns an acknowledgment to the app when the local device has finished storing the data; it doesn't wait until the data has been successfully stored at the remote location, That removes the delay caused by latency. However, because the data at the primary and secondary sites aren't always synchronized, there's no guarantee the remote site will provide accurate data for disaster recovery.

Finally, some SAN vendors let you send snapshots of data at frequent intervals to the remote facility, thereby providing synchronization checkpoints from which an application can recover if the local storage facility fails. The snapshot updates the remote storage facility in a complete, consistent way. To recover from failure, the app simply connects to the remote facility and uses the remote database. Some data transactions won't be there, but the application is aware of when the last snapshot was taken and the fact that subsequent transactions must be re-entered.

Eric Siegel and Bill Terrill are senior analysts at the Burton Group, a technical research and advisory service that supports global enterprises.