Data Reduction Technologies: The Basics

Data deduplication, data compression, and thin provisioning can help organizations tackle massive data growth. Here's an overview of how they work.

Vaughn Stewart

September 19, 2014

3 Min Read
Network Computing logo

Globally, CTOs face the daunting task of managing aggressive data growth in the face of limited budgets and oversubscribed data center resources. With capacities increasing annually at aggressive rates ranging from 30% to 70%, IT budgets are strained.

A rising trend in storage systems is the inclusion of data reduction technologies like deduplication, compression, and thin provisioning, which can reduce data sets from 1/4 to 1/10 of the original capacity and are designed to offset growth by storing more data per storage device. These technologies can provide substantial benefits. However, they differ significantly from one another in terms of how they operate and the data sets best suited for their use. Here is an overview.

Data deduplication reduces storage capacity by removing redundant data within individual and across multiple files. Benefits are most often associated with unstructured data sets (like home directories and department shares), virtual machines and application services, virtual desktops, or test and development environments.

These IT services require multiple identical copies of data in order to operate. Most deduplication services operate inline as data is written to the storage platform. A key factor for success with deduplication is the granularity of the block size. A smaller block size provides greater data reduction and will continue to produce savings as data sets age.

Data compression provides storage savings by applying algorithms that reduce the capacity required to store a block of data. Benefits are most often observed with relational databases, including online transaction processing (OLTP), decision support systems (DSS), and data warehouses. Savings are obtainable but diminish with unstructured and encrypted data sets. A key factor for success is the number of compression algorithms provided by the storage platform.

Most forms of compression operate inline as data is written to the system. Inline processes tend to prioritize system performance over data reduction and thus produce moderate returns. Some storage systems include additional, more aggressive post-process compression algorithms that can double the initial inline storage savings.

Thin provisioning complements the previous two technologies, providing a dynamic form of storage allocation. It allows for more data to be stored on the storage platform by eliminating the pre-allocation of unused capacity that is lost with traditional provisioning. Thin provisioning provides an efficient, on-demand storage consumption model.

A word of caution: Including thin provisioning when calculating one's storage savings is foolhardy, because results are calculated based on the virtual capacity provisioned. This value can be modified without correlation to the actual data or maximum storage of a platform.

Best together
Used in conjunction, these three technologies produce peak results. When they are used individually or in tandem, the data reduction benefits quickly diminish.

When evaluating storage systems, savvy buyers should seek granular deduplication and multiple forms of compression, and they should ignore data reduction calculations that result from thin provisioning. These data reduction technologies should now be viewed as standard features in modern storage platforms. They have long been accepted in backup to disk solutions and have evolved for use with high-performance, low-latency production data sets.

Storage systems designed with data reduction technologies as a part of their foundation provide maximum performance with capacity reduction. These types of systems should be prioritized when considering one's next array purchase. Their use will allow organizations to regain, and thus repurpose, data center resources that, in turn, can add decades of new life to resource-constrained data centers.

About the Author(s)

Vaughn Stewart

Chief Technical Evangelist, Pure Storage

Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like

More Insights