Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Does De-Duplication Have Drawbacks?

Data de-duplication is certainly a hot topic, and in my opinion deservedly so. The technology offers an exciting solution for users that are drowning in duplicate data. Many environments today store more backup copies of relatively static data than ever before. Regulatory compliance issues and application availability service levels drive the need for these multiple copies, so they can't be arbitrarily reduced.

Data de-duplication technology lets the storage manager store multiple copies of static data at greatly reduced consumption levels by recognizing files that are exact duplicates. Many vendors offer storage space savings ratios of 15:1 or higher. This type of space savings can turn into dollar savings for environments that typically house as much as 10:1 ratios of backup to production data on VTL or physical tape media. The cost and capacity savings in this scenario are clear: Offer the same level of data protection and storage functionality at a fraction of the cost.

So what's the catch? Just like any other technology, vendors have taken multiple approaches to data de-duplication. Almost all data de-duplication platforms rely on proprietary algorithms to analyze data to determine where exact duplicates exist. The file is then stored once, but referred to with unique file pointers to maintain application transparency and the illusion that the file exists in all of its forms. The difference between many competing vendors appears in their place in the data stream. Some vendors offer products that analyze and de-duplicate data "in-band," or in the data stream in real time. Other vendors offer products that manipulate the data once it's stored, or "out-of-band."

Each mechanism offers benefits and drawbacks. The in-band mechanism is more capacity efficient, but can seriously impact performance. The out-of-band mechanism is less storage efficient, because the multiple copies of data must first be stored before analysis and de-duplication, but it is typically considered more performance-friendly.

Regardless of the platform or mechanism, data de-duplication can benefit almost any environment. The key to successful implementation of a data de-duplication solution is to fully understand both the rate of change for specific data and its retention requirements.

  • 1