Pure Storage Aims To Close The Flash Price Gap

Startup says its flash storage is cheaper than high-performance HDDs by using deduplication and compression as standard features.

David Hill

September 19, 2013

7 Min Read
Network Computing logo

A common perception is that flash storage is more expensive than high-performance (that is, 10K/15K RPM) hard disks. On a raw, per-byte basis, that may be the case, but that is not the right measure; instead, the comparison should be between bytes of usable storage, not raw storage. That is to say, the measure should be applied to the storage that is actually put to work. Plus, since using deduplication and compression magnifies the actual raw capacity of flash storage into a much larger quantity of usable storage, the price difference may very well vanish in, well, a flash.

Pure Storage argues that the idea of HDDs being less expensive than flash is no longer valid.

Pure Storage is an all-flash, controller-based, scale-up enterprise storage vendor that competes directly with traditional and hybrid storage arrays, as well as with all-flash storage arrays. It claims that its flash storage is less expensive than high-performance HDDs, and that the gap favoring flash storage will only increase over the next few years. The company has also tackled and addressed supposed reliability issues with flash storage so that, from a cost and reliability perspective, flash storage in a controller-based, all-flash array measures up.

Without going into too much detail, Pure Storage builds deduplication and compression into its flash storage arrays as a standard feature that the customer cannot turn off (and, in fact, for architectural reasons having to do with managing writes, would not want to turn off). Although different storage workloads are not subject to deduplication and compression uniformly, Pure Storage claims approximately a 6-to-1 data reduction ratio on average, but results can be higher or lower. Still, that means that each 1 Tbyte of flash purchased produces 6 Tbytes of usable storage, a ratio that makes investing in flash highly attractive.

How valid is the company's claim? Well, VCs have just advanced Pure Storage $150M in additional funding. Although large vendors are obviously investing even larger sums in flash storage, smaller vendors are often able to take a focused approach and don't have to worry about protecting existing legacy investments.

Deduplication and Compression

The obvious question is: Can’t hard disks perform those same deduplication and compression functions just as efficiently? Now, compression can be applied successfully to HDDs, but that practice has not seen broad acceptance. Deduplication in the form of single-instance file deduplication is fairly common, but generally not really block-level deduplication. (And there is no reason that single instancing could not be applied before block level deduplication on flash, anyway.)

Pure Storage has identified four specific reasons why similar deduplication and compression technology is highly unlikely to be applied to HDDs. All take into account the difference in architectures between flash storage that (as a solid state technology) has no moving parts and HDDs that (as electromagnetic devices) do.

•Flash storage is more efficient than HDDs in dealing with random I/Os on reads. The process of deduplication takes out duplicate data and replaces with pointers; the result is that any given dataset may be spread over much of the storage array. Thus, when reading a file whose pieces were originally sequentially linked, pieces may be anywhere. This creates random I/Os, which are the bane of existence of HDDs. Think of all the moving heads when trying to reassemble a file from a large number of individual disks versus the “virtual” hop and skip approach (so to speak) of flash storage -- that is why the different pooling architectures matter with deduplication on reads.

•Flash is better at dealing with writes. Deduplication and compression add operations complexity (such as verifying the validity of the data that is being written) that take time and CPU cycles; flash storage can deal with these issues without the significant additional overhead that HDDs have to use to do the job. Consequently, flash storage is more efficient on the write side.

[Read how EMC makes a case for flash in traditional storage systems in "EMC VNX: Flash Storage Takes Center Stage."]

Flash is more effective in handling storage virtualization. The data reduction produced by the combination of deduplication and compression requires the complete virtualization of the array (as the process of necessity separates the logical position of the data from its physical placement). This virtualization is very fine-grained (such as 512-byte chunks), which leads to a metadata structure that can handle the pointers to billions and potentially to trillions of objects. Pure Storage argues that retrofitting the controllers of standard HDD arrays simply isn’t possible; even doing it upfront with a flash array is very difficult. Flash storage makes it easier to deal with the modified data using compression.

Compression is non-deterministic in size; that means that when a file is over-written or modified, it may not fit in the same space. That can lead to a read-modify-write cycle in which, among other things, decompression and recompression have to take place. Although StorageTek’s Iceberg addressed this problem as far back as 1994 on HDDs, the latencies involved with mechanical devices are far larger and more cumbersome than with flash.

The bottom line is that flash storage can make data reduction in its entirety work, and that there are significant challenges in doing the same with HDD arrays.

NEXT: Pure's Scale-Up Focus

Pure Storage focuses on the scale-up midrange and enterprise market. That means that it does not focus on the service provider market where quality of service (QoS) software is important, such as “noisy neighbor” multitenancy issues where one user may try to hog resources. However, the company does need to focus on storage management software basics, such as replication technology.

Scale-up versus scale-out approaches to storage have been much debated, but practically speaking, scale-up meets the needs of many customers, and trying to be all things to all men may not be in the best interests of Pure Storage. One challenge for Pure Storage is the scale-up hybrid storage array that mixes flash and HDDs. Such products are available from many vendors, including large ones that have marketing and sales muscle, as well as an installed base of customers for whom switching costs would be an issue.

A second challenge is from competing all-flash array vendors. These may also be able to support deduplication and compression so that, in time, the data reduction claims that are Pure Storage’s bread and butter will become check-box items. However, Pure Storage argues that doing this with the necessary latency and reliability requirements is very difficult, and that its competitors may not be able to duplicate what it's already demonstrated.

Still, Pure Storage is not just about price, but also ease of use and other qualities that have attracted customers to the company. Moreover, the company feels it has an 18-month window to effectively deploy its $150M on development that can give its products a somewhat sustainable competitive advantage. In addition, the company plans to tighten its marketing and sales focus to build a stronger customer base to the growing number of new prospects that market forecasts suggest are receptive to the idea of deploying an all-flash array for primary storage.

Mesabi Musings

The topic of flash -- when, where, how much -- is probably the hottest storage topic today. Many flash configurations and architectures have proven to be quite attractive for many use cases. Still, there is a lot of untapped potential, and one means of capturing that potential is to overcome perceptions that all-flash storage arrays are too expensive relative to high-performance HDDs.

Pure Storage argues that the combination of deduplication and compression as standard functionalities on its all-flash storage arrays results in 6 TBs of usable storage for each 1 TB of raw flash storage, enabling it to compete economically with HDDs where 6 TB of raw and usable capacity are the same. On a level playing field, Pure Storage believes that it can win the case for why buying all-flash storage arrays is better than buying hybrid or traditional storage arrays. That should be an interesting conversation, but one where price difference is not the only or determining factor.

Pure Storage is not a client of David Hill and the Mesabi Group.

[Find out how flash-based SSDs work and the various ways they can be deployed in Howard Marks' session "SSDs in the Data Center" at Interop New York Sept. 30-Oct. 4]

About the Author(s)

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox
More Insights