DeDupe's Next Era -- Part One

Many will say that the next area of focus for dedupe will be in archive and primary storage -- and it will. But understand that backup will continue to be a major focus as it remains the greatest source of redundant data sets.

George Crump

June 17, 2009

2 Min Read
Network Computing logo

As the Data Domain, NetApp, EMC saga draws to a close, it appears as though we are ready to enter a new era in data deduplication. The first era was backup. It has quickly become a capability being delivered by just about every supplier in the backup space, with some delivering several solutions. In this era, Data Domain has about 2 billion reasons to claim victory. As we move into the next era of deduplication, what should we be looking for?The first era of dedupe solved a problem that caused users massive amounts of pain, which they wanted to fix simply and with as little disruption as possible. The ability to easily extend the architecture and not replace it was appealing. Now, as we move to the next era, there are going to be greater areas of concern. This is partly because dedupe will enter more critical areas of the environment and the amount of data being put through a dedupe engine will continue to increase.Many will say that the next area of focus for dedupe will be in archive and primary storage -- and it will. But understand that backup will continue to be a major focus as it remains the greatest source of redundant data sets. Also, don't assume that all vendors have completed their first era of work; many, for example, can't replicate well, or handle a variety of data types from different data sources.In backup, we will need to see more of a scale-out type of architecture that will allow a large single dedupe repository or the ability for independent nodes to communicate so that redundant data only needs to be stored once. These scale-out architectures will allow for greater inbound and outbound performance.The other big battle will be what exactly does the dedupe. Will it be an appliance type of architecture that allows for multiple software applications to leverage the same dedupe repository backup software, which could require a single vendor approach, or would an API extension to the backup software make more sense? As we discuss in our recent article "Integrating Deduplication", this provides backup administrators the flexibility to pick the deduplication strategy that makes the most sense for them.As software vendors enter the deduplication market via extensions to their own code, this then makes source-side deduplication more practical. Instead of ripping out your current solution, just add the "dedupe module" and keep going. There will be quite a bit of room for differentiation here as things like client impact and dedupe performance will need to be compared. As we move into the next era of dedupe, much of the discussion will move beyond backup, maybe prematurely, and we will start talking about archive and primary storage dedupe, which will be the subject of my next entry.

About the Author(s)

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox
More Insights