The Primary Storage Deduplication Problem

Vendors struggle to develop a deduplication engine that won't impact performance and still maintain data integrity.

George Crump

April 30, 2013

3 Min Read
Network Computing logo

Deduplication was successful in backup for two reasons. First, it dramatically improved the efficiency of storing backup data and, second, the vendors needed that efficiency so that they could be price competitive against the prevalent tape technology. Primary storage deduplication has not yet been as successful as backup dedupe technology for a number of reasons, creating a lack of enthusiasm from the vendors and potential users.

The Primary Dedupe Payoff Problem

One of the key challenges that primary storage deduplication faces and one of the reasons for lack of enthusiasm is that dedupe's level of efficiency won't be as high, in most cases, as it was in backup since there is not the level of data redundancy. That said, the results of primary storage deduplication are still impressive and there should be enough efficiency gain to stir the interest of any data center manager.

For example, in our primary storage deduplication lab test we are getting an 80% reduction on a production NFS share that stores our home directory. Like most home directories it contains mostly office productivity files. A future step in our testing is on a VMware data store, where we expect even more impressive results.

Primary Dedupe Is Hard

Almost every vendor that I speak with has or had some form of deduplication technology in house or under development. Larger vendors have bought their way in, while smaller vendors have leveraged some of the open source code that is available. But the reality is that most of these products have not made it to primary storage. It is hard to develop a deduplication engine that won't impact performance and maintain data integrity, especially as the storage system scales to meet capacity requirements. As a result, some vendors have either shelved or downplayed their dedupe capabilities. Only a few have managed to strike the performance and data integrity balance.

Lack of Cheaper Competitor

As I mentioned in the opening, another motivator for backup deduplication success was that it was competing against tape for the hearts and minds of the backup administrator. The price delta between standard disk and tape made the investment in deduplication necessary so that gap could be closed. Until recently this was not the case in primary storage. There was not as large a price gap between fast enterprise disks and slower hard drives. The price difference between a 15K RPM hard disk and a 10K RPM hard disk was not great enough to make an investment in deduplication worth the effort.

The appearance of flash-based storage however is re-creating the tape vs disk dynamic in the form of hard disk vs. solid state disk (SSD). There is a significant price delta between SSD and HDD. As we discussed in our article "Overcoming The All Flash Storage Challenge ... Cost" deduplication is a go-to option when trying to drive down the cost of flash-based storage systems. Similar to how deduplication drove disk into the backup process, deduplication could be the key to driving flash-based storage into the mainstream of primary storage.

My opinion is that deduplication will eventually be integrated into all primary storage systems, operating systems and hypervisors. But most of the innovation will come from vendors that don't have a vested interest in continuing to sell more hard disks. Look for All-Flash and Hybrid Storage vendors to take the lead on primary storage deduplication, either by developing their own dedupelication technology or leveraging an existing API set.

About the Author(s)

Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like

More Insights