Pure Storage argues that the idea of HDDs being less expensive than flash is no longer valid.
Pure Storage is an all-flash, controller-based, scale-up enterprise storage vendor that competes directly with traditional and hybrid storage arrays, as well as with all-flash storage arrays. It claims that its flash storage is less expensive than high-performance HDDs, and that the gap favoring flash storage will only increase over the next few years. The company has also tackled and addressed supposed reliability issues with flash storage so that, from a cost and reliability perspective, flash storage in a controller-based, all-flash array measures up.
Without going into too much detail, Pure Storage builds deduplication and compression into its flash storage arrays as a standard feature that the customer cannot turn off (and, in fact, for architectural reasons having to do with managing writes, would not want to turn off). Although different storage workloads are not subject to deduplication and compression uniformly, Pure Storage claims approximately a 6-to-1 data reduction ratio on average, but results can be higher or lower. Still, that means that each 1 Tbyte of flash purchased produces 6 Tbytes of usable storage, a ratio that makes investing in flash highly attractive.
How valid is the company's claim? Well, VCs have just advanced Pure Storage $150M in additional funding. Although large vendors are obviously investing even larger sums in flash storage, smaller vendors are often able to take a focused approach and don't have to worry about protecting existing legacy investments.
Deduplication and Compression
The obvious question is: Can’t hard disks perform those same deduplication and compression functions just as efficiently? Now, compression can be applied successfully to HDDs, but that practice has not seen broad acceptance. Deduplication in the form of single-instance file deduplication is fairly common, but generally not really block-level deduplication. (And there is no reason that single instancing could not be applied before block level deduplication on flash, anyway.)
Pure Storage has identified four specific reasons why similar deduplication and compression technology is highly unlikely to be applied to HDDs. All take into account the difference in architectures between flash storage that (as a solid state technology) has no moving parts and HDDs that (as electromagnetic devices) do.
• Flash storage is more efficient than HDDs in dealing with random I/Os on reads. The process of deduplication takes out duplicate data and replaces with pointers; the result is that any given dataset may be spread over much of the storage array. Thus, when reading a file whose pieces were originally sequentially linked, pieces may be anywhere. This creates random I/Os, which are the bane of existence of HDDs. Think of all the moving heads when trying to reassemble a file from a large number of individual disks versus the “virtual” hop and skip approach (so to speak) of flash storage -- that is why the different pooling architectures matter with deduplication on reads.
• Flash is better at dealing with writes. Deduplication and compression add operations complexity (such as verifying the validity of the data that is being written) that take time and CPU cycles; flash storage can deal with these issues without the significant additional overhead that HDDs have to use to do the job. Consequently, flash storage is more efficient on the write side.
[Read how EMC makes a case for flash in traditional storage systems in "EMC VNX: Flash Storage Takes Center Stage."]
• Flash is more effective in handling storage virtualization. The data reduction produced by the combination of deduplication and compression requires the complete virtualization of the array (as the process of necessity separates the logical position of the data from its physical placement). This virtualization is very fine-grained (such as 512-byte chunks), which leads to a metadata structure that can handle the pointers to billions and potentially to trillions of objects. Pure Storage argues that retrofitting the controllers of standard HDD arrays simply isn’t possible; even doing it upfront with a flash array is very difficult. Flash storage makes it easier to deal with the modified data using compression.
Compression is non-deterministic in size; that means that when a file is over-written or modified, it may not fit in the same space. That can lead to a read-modify-write cycle in which, among other things, decompression and recompression have to take place. Although StorageTek’s Iceberg addressed this problem as far back as 1994 on HDDs, the latencies involved with mechanical devices are far larger and more cumbersome than with flash.
The bottom line is that flash storage can make data reduction in its entirety work, and that there are significant challenges in doing the same with HDD arrays.
NEXT: Pure's Scale-Up Focus