Do We Need Consolidated Deduplication?
Posted by George Crump on July 31, 2009
In my last entry, "Can We Get to a Single Point of Deduplication?", I looked at who had the capabilities for a single point of deduplication, essentially consolidating the deduplication engine so that only one supplier and one dedupe engine manages all the optimized data regardless of storage tier; primary, secondary, archive and backup. Another question is, do you really need it?
As I pointed out, there would be some theoretical gains by consolidating the deduplication process. The more data that goes through the engine the better chance it has seen that data before and can be optimized. Also, if you have a different deduplication engine at each tier, that means the data has to be expanded or re-hydrated as you move that data between the tiers of storage.








Comment by DedupeDude on August 3, 2009 8:04 PM
Hey George, great article! I also noticed your post on dedupe2.com. These are all relevant and good food for thought. Thanks for highlighting this. I think this technology is hot right now and you're hitting some key topics.
Reply to this comment
Comment by Phillip DePaige on August 4, 2009 3:18 PM
I'm not so sure that one deduope engine really makes sense. I really see two different things being called dedupe, one being block level dedupe which is good for traditional backup programs, and the second is single instance storage, which is good for protecting duplicate data on networks. Those really need to be separate engines.
A second problem is that one engine means one communication point for the engine. This is not only a single point of failure, but it also means the data has to be communicated across the network every time it is used. The savings in disk storage could be bought only at the price of slow data delivery and congested networks.
Moderation and compromise seem to be the order of the day.
Reply to this comment
Comment by Integral Content on August 4, 2009 4:02 PM
One of the main concerns about de-dupe is its impact on records management and compliance. If there are multiple de-dupe engines working on data can it be said with certainty that whole documents/files will be accurate and unchanged when re-constituted five or ten years down the road. I understand the need to reduce storage use, but I believe that the best way to do this is by getting better at information management rather than throwing a technology at the problem. It just reduces total amount of data stored from all that's being created without addressing the root cause: whether all that data should be created and/or retained for any amount of time. Effectively managing the information generated within an organization from the point of creation through storage and expiration is key. However, in many organization there are no rules, or very few rules related to how new information created should be tagged based on its value at the time its created. De-dupe as a technology has a place in IT and it needs to be considered within the broader context of managing business critical information residing in file-based data.
Reply to this comment