Special Coverage Series

Network Computing

Special Coverage Series


Pure Storage Aims To Close The Flash Price Gap

Startup says its flash storage is cheaper than high-performance HDDs by using deduplication and compression as standard features.

A common perception is that flash storage is more expensive than high-performance (that is, 10K/15K RPM) hard disks. On a raw, per-byte basis, that may be the case, but that is not the right measure; instead, the comparison should be between bytes of usable storage, not raw storage. That is to say, the measure should be applied to the storage that is actually put to work. Plus, since using deduplication and compression magnifies the actual raw capacity of flash storage into a much larger quantity of usable storage, the price difference may very well vanish in, well, a flash.

Pure Storage argues that the idea of HDDs being less expensive than flash is no longer valid.

More Insights

Webcasts

More >>

White Papers

More >>

Reports

More >>

Pure Storage is an all-flash, controller-based, scale-up enterprise storage vendor that competes directly with traditional and hybrid storage arrays, as well as with all-flash storage arrays. It claims that its flash storage is less expensive than high-performance HDDs, and that the gap favoring flash storage will only increase over the next few years. The company has also tackled and addressed supposed reliability issues with flash storage so that, from a cost and reliability perspective, flash storage in a controller-based, all-flash array measures up.

Without going into too much detail, Pure Storage builds deduplication and compression into its flash storage arrays as a standard feature that the customer cannot turn off (and, in fact, for architectural reasons having to do with managing writes, would not want to turn off). Although different storage workloads are not subject to deduplication and compression uniformly, Pure Storage claims approximately a 6-to-1 data reduction ratio on average, but results can be higher or lower. Still, that means that each 1 Tbyte of flash purchased produces 6 Tbytes of usable storage, a ratio that makes investing in flash highly attractive.

How valid is the company's claim? Well, VCs have just advanced Pure Storage $150M in additional funding. Although large vendors are obviously investing even larger sums in flash storage, smaller vendors are often able to take a focused approach and don't have to worry about protecting existing legacy investments.

Deduplication and Compression

The obvious question is: Can’t hard disks perform those same deduplication and compression functions just as efficiently? Now, compression can be applied successfully to HDDs, but that practice has not seen broad acceptance. Deduplication in the form of single-instance file deduplication is fairly common, but generally not really block-level deduplication. (And there is no reason that single instancing could not be applied before block level deduplication on flash, anyway.)

Pure Storage has identified four specific reasons why similar deduplication and compression technology is highly unlikely to be applied to HDDs. All take into account the difference in architectures between flash storage that (as a solid state technology) has no moving parts and HDDs that (as electromagnetic devices) do.

• Flash storage is more efficient than HDDs in dealing with random I/Os on reads. The process of deduplication takes out duplicate data and replaces with pointers; the result is that any given dataset may be spread over much of the storage array. Thus, when reading a file whose pieces were originally sequentially linked, pieces may be anywhere. This creates random I/Os, which are the bane of existence of HDDs. Think of all the moving heads when trying to reassemble a file from a large number of individual disks versus the “virtual” hop and skip approach (so to speak) of flash storage -- that is why the different pooling architectures matter with deduplication on reads.

• Flash is better at dealing with writes. Deduplication and compression add operations complexity (such as verifying the validity of the data that is being written) that take time and CPU cycles; flash storage can deal with these issues without the significant additional overhead that HDDs have to use to do the job. Consequently, flash storage is more efficient on the write side.

[Read how EMC makes a case for flash in traditional storage systems in "EMC VNX: Flash Storage Takes Center Stage."]

Flash is more effective in handling storage virtualization. The data reduction produced by the combination of deduplication and compression requires the complete virtualization of the array (as the process of necessity separates the logical position of the data from its physical placement). This virtualization is very fine-grained (such as 512-byte chunks), which leads to a metadata structure that can handle the pointers to billions and potentially to trillions of objects. Pure Storage argues that retrofitting the controllers of standard HDD arrays simply isn’t possible; even doing it upfront with a flash array is very difficult. Flash storage makes it easier to deal with the modified data using compression.

Compression is non-deterministic in size; that means that when a file is over-written or modified, it may not fit in the same space. That can lead to a read-modify-write cycle in which, among other things, decompression and recompression have to take place. Although StorageTek’s Iceberg addressed this problem as far back as 1994 on HDDs, the latencies involved with mechanical devices are far larger and more cumbersome than with flash.

The bottom line is that flash storage can make data reduction in its entirety work, and that there are significant challenges in doing the same with HDD arrays.

NEXT: Pure's Scale-Up Focus

 1 | 2  | Next Page »


Related Reading



Network Computing encourages readers to engage in spirited, healthy debate, including taking us to task. However, Network Computing moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. Network Computing further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | Please read our commenting policy.
 

Editor's Choice

Research: 2014 State of Server Technology

Research: 2014 State of Server Technology

Buying power and influence are rapidly shifting to service providers. Where does that leave enterprise IT? Not at the cutting edge, thatís for sure: Only 19% are increasing both the number and capability of servers, budgets are level or down for 60% and just 12% are using new micro technology.
Get full survey results now! »

Vendor Turf Wars

Vendor Turf Wars

The enterprise tech market used to be an orderly place, where vendors had clearly defined markets. No more. Driven both by increasing complexity and Wall Street demands for growth, big vendors are duking it out for primacy -- and refusing to work together for IT's benefit. Must we now pick a side, or is neutrality an option?
Get the Digital Issue »

WEBCAST: Software Defined Networking (SDN) First Steps

WEBCAST: Software Defined Networking (SDN) First Steps


Software defined networking encompasses several emerging technologies that bring programmable interfaces to data center networks and promise to make networks more observable and automated, as well as better suited to the specific needs of large virtualized data centers. Attend this webcast to learn the overall concept of SDN and its benefits, describe the different conceptual approaches to SDN, and examine the various technologies, both proprietary and open source, that are emerging.
Register Today »

Related Content

From Our Sponsor

How Data Center Infrastructure Management Software Improves Planning and Cuts Operational Cost

How Data Center Infrastructure Management Software Improves Planning and Cuts Operational Cost

Business executives are challenging their IT staffs to convert data centers from cost centers into producers of business value. Data centers can make a significant impact to the bottom line by enabling the business to respond more quickly to market demands. This paper demonstrates, through a series of examples, how data center infrastructure management software tools can simplify operational processes, cut costs, and speed up information delivery.

Impact of Hot and Cold Aisle Containment on Data Center Temperature and Efficiency

Impact of Hot and Cold Aisle Containment on Data Center Temperature and Efficiency

Both hot-air and cold-air containment can improve the predictability and efficiency of traditional data center cooling systems. While both approaches minimize the mixing of hot and cold air, there are practical differences in implementation and operation that have significant consequences on work environment conditions, PUE, and economizer mode hours. The choice of hot-aisle containment over cold-aisle containment can save 43% in annual cooling system energy cost, corresponding to a 15% reduction in annualized PUE. This paper examines both methodologies and highlights the reasons why hot-aisle containment emerges as the preferred best practice for new data centers.

Monitoring Physical Threats in the Data Center

Monitoring Physical Threats in the Data Center

Traditional methodologies for monitoring the data center environment are no longer sufficient. With technologies such as blade servers driving up cooling demands and regulations such as Sarbanes-Oxley driving up data security requirements, the physical environment in the data center must be watched more closely. While well understood protocols exist for monitoring physical devices such as UPS systems, computer room air conditioners, and fire suppression systems, there is a class of distributed monitoring points that is often ignored. This paper describes this class of threats, suggests approaches to deploying monitoring devices, and provides best practices in leveraging the collected data to reduce downtime.

Cooling Strategies for Ultra-High Density Racks and Blade Servers

Cooling Strategies for Ultra-High Density Racks and Blade Servers

Rack power of 10 kW per rack or more can result from the deployment of high density information technology equipment such as blade servers. This creates difficult cooling challenges in a data center environment where the industry average rack power consumption is under 2 kW. Five strategies for deploying ultra-high power racks are described, covering practical solutions for both new and existing data centers.

Power and Cooling Capacity Management for Data Centers

Power and Cooling Capacity Management for Data Centers

High density IT equipment stresses the power density capability of modern data centers. Installation and unmanaged proliferation of this equipment can lead to unexpected problems with power and cooling infrastructure including overheating, overloads, and loss of redundancy. The ability to measure and predict power and cooling capability at the rack enclosure level is required to ensure predictable performance and optimize use of the physical infrastructure resource. This paper describes the principles for achieving power and cooling capacity management.