Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Reduction for Primary Storage

In the past few years, data reduction technologies like compression and more recently data de-duplication have become quite popular, especially for use in backup and archiving. Can this trend continue into primary storage?

In backup, especially where there is a great deal of redundant data, there has been a mass adoption of data reduction technologies. In just a few short years, data de-duplication has gone from an obscure to a well known term in the data center. Its ability to eliminate redundant segments of data has provided great benefit to backup storage and some types of archive storage. In backup data, assuming a weekly full backup, a 20X storage efficiency quotient is not uncommon.

Primary storage is different
Unfortunately, moving de-duplication into primary storage isnt as simple as shifting its location. Following is an outline of the particular requirements of primary storage that need to be considered in planning de-duplication:

1. Primary storage is performance sensitive. Primary storage is active, and if the implementation of data de-duplication causes a performance impact on the production environment it will not be acceptable. Either the performance of the de-duplication technology must be so efficient and fast that it does not impact performance, or it has to be done out of band on files that are not immediately active.

The ideal is a near-production data set that is de-duplicated as a background process, removing the possibility of any performance impact. It would also make sense that this technology has the capability to de-duplicate and compress at different levels of efficiency --the greater the data reduction level, the greater the chance of impact on performance when the data is read back in. While it would be great to have an inline system that was fast enough to reduce the data set without impacting performance, the technology does not exist today.

  • 1