In the era before solid-state drives, performance in systems was controlled by the slow speed of the hard disk drives. Consequently, the typical network wasn't saturated, and even 1-gigabit Ethernet worked in many environments.
Things have changed. A box with 24 SSDs can push the limits of networks, and that's a relatively small configuration. All-flash arrays hit 1 million IOPs or more, and the numbers are going up quickly. Clearly, we need faster networking with 10GbE or even 40GbE. But just as clearly, we are still only a couple of years from the network moving from bottleneck to choke point.
The answer to this might be even faster connections, and the Ethernet community is driving hard for 100 gigabits and beyond. This will take time, and there are pressures on the network today beyond bandwidth. Latency of response is an issue. A request that takes hundreds of microseconds can be a problem in many environments, such as financial trading systems.
We've done several things to speed Ethernet up. RDMA reduces computing at the endpoints to nearly nothing, with data moving directly from memory to memory without CPU involvement. Converged Ethernet limits the collision process between messages to make delivery more efficient. Both of these impose significant cost in the form of specialized NICs and switches.
However, there are other solutions to increase transactions: Shrink the payload in each transaction, or even avoid the transaction altogether. We have tools to do this. Moreover, they fit the storage model really well.
With the notable exception of video and images, most files can be compressed tremendously. This is done by looking for sequences of bytes that are repeated frequently and replacing them by a short pointer to a template. Examples are the large fields of 0s in a Word document.
The compressed file is then stored and retrieved from the networked storage. Very often, the compression achieved is as much as 95%. That's a dramatic reduction of the payload going to and from the storage. It is the equivalent of quadrupling the network's effective speed.
We also have tools for avoiding the transfer in the first place. Some object storage systems use object deduplication extensively, which changes the writing of a second copy of any object to just adding a short pointer to a metadata file. This pointer can address objects as small as 1 KB or bigger than 100 GB, and it saves both on stored data and the network cost of moving it.
Block I/O arrays also can deduplicate and compress data, but the mechanisms are less efficient than at the file/object level. In both object and block storage, the trick is to do all the deduplication and compression on the host end of the cable. This host-based processing still is not perfected, though Windows Server 2012 allows for a process akin to the compression approach above, and file systems mountable to Linux support the compression of files.
Compression and deduplication take resources and time. With the object of primary storage being the fastest I/O possible, this is an issue, and many systems limit compression and deduplication to secondary storage operations. Even so, secondary operations can generate a lot of traffic.
It's worth looking at some specific use cases to see how compression and deduplication could help speed storage traffic. First, we are moving into an era of primary SSD storage and secondary bulk HDDs. Data will be cached in the SSD, or there will be a tiering process that moves whole objects as needed. This is a background job, so only compressed and deduplicated data need move. Purists might note that data will often be read directly from the secondary tier, but decompression is much faster than compression, so this is acceptable, given that data loads will be quicker.
Next, in object stores, replicas of the original data are made (usually two or three), and these are sent to other nodes in the storage pool to provide data integrity. With the ingesting node journaling data until replication is complete, sending only compressed data for the replicas is an option, and in terms of total bandwidth used, the savings could be huge. Backup and archiving have potential savings similar to replication.
Another use case is in virtual machines with local instance stores. The instance storage idea is a way to speed up I/O for in-memory operations, but it's likely to be used more extensively because of the much faster I/O it delivers. Moving data from the network storage to the instance storage in compressed format will save time, while writeback can be an asynchronous background process. Decompression on demand is an option that can save space on the virtualized server.
Can we get deduplication and compression into links to primary network storage? It's a matter of processing speed, and the answer is probably "Not soon." Adapters are in the pipeline that can crank hashes for deduplication in-stream, which will facilitate source-based deduplication, but even these will add to storage delay. Even so, the reduction of total traffic via deduplication and compression is worth the learning curve on how to deploy it.Jim O'Reilly was Vice President of Engineering at Germane Systems, where he created ruggedized servers and storage for the US submarine fleet. He has also held senior management positions at SGI/Rackable and Verari; was CEO at startups Scalant and CDS; headed operations at PC ... View Full Bio