Dedupe Ratios Do Matter

A recent blog post by CommVault's Dipesh Patel has reopened the argument over the value of higher deduplication ratio. He, like others that claims that dedupe ratios don't matter, and points out that a dedupe ratio of 10:1 reduces the size of your data by 90 percent, reaching the zone of diminishing returns since a 20:1 dedupe ratio will only get you another 5 percent reduction. To me that argument sounds an awful lot like Lucy Ricardo explaining how she saved a lot of money on the new living ro

Howard Marks

April 5, 2010

2 Min Read
Network Computing logo

A recent blog post by CommVault's Dipesh Patel has reopened the argument over the value of higher deduplication ratio. He, like others that claims that dedupe ratios don't matter, and points out that a dedupe ratio of 10:1 reduces the size of your data by 90 percent, reaching the zone of diminishing returns since a 20:1 dedupe ratio will only get you another 5 percent reduction. To me that argument sounds an awful lot like Lucy Ricardo explaining how she saved a lot of money on the new living room furniture because it was on sale.

I tend to think more like Ricky and worry not about how much was saved but how much the new living room suite, or backup solution, cost. A system that gets 20:1 is twice as good at data reduction as one that gets 10:1 not 5 percent better. It's the absolute space needed not the incremental savings that we shell out our hard-earned budget dollars for.

If I have 500TB of backup data I want to keep online, I'll need 50TB of usable space in a system that reduces data 10:1 solution but just 25TB if it got 20:1. Assuming 1TB drives and 10+2 RAID-6 that would be 60 1TB drives for the system that gets 10:1 but just 30 for the 20:1 system.  Even if the initial purchase price of the two solutions was the same, I'd still need to pay more to rack, maintain, power and cool twice as many disk drives Things get really interesting when we start replicating the data. Dedupe twice as effectively and you've reduced the amount of data to replicate in half. That could easily be the difference between meeting an SLA to replicate backups off-site in 24 hours with one T-3 line or having to pony up for a pair of T-3s.

Even if they can't accept my logic, which I admit may not be as solid as Spock's, vendor spokespeople like Mr. Patel should avoid making arguments that metrics like dedupe ratios don't matter because some readers will assume they're trying to downplay a weakness in their product. There's scarce data available on how well various products dedupe. Add in that deduplication rates can vary greatly as the amount of duplicate data varies from data set to another and one can understand why users are skeptical about dedupe ratio claims and counterclaims.

In addition to Mr. Patel I'd like to thank Curtis Preston (Blog Entry 1, Blog Entry 2) and Sepaton's Jay Livens (Blog Entry) for adding to the discussion in their blog posts.

About the Author(s)

Howard Marks

Network Computing Blogger

Howard Marks</strong>&nbsp;is founder and chief scientist at Deepstorage LLC, a storage consultancy and independent test lab based in Santa Fe, N.M. and concentrating on storage and data center networking. In more than 25 years of consulting, Marks has designed and implemented storage systems, networks, management systems and Internet strategies at organizations including American Express, J.P. Morgan, Borden Foods, U.S. Tobacco, BBDO Worldwide, Foxwoods Resort Casino and the State University of New York at Purchase. The testing at DeepStorage Labs is informed by that real world experience.</p><p>He has been a frequent contributor to <em>Network Computing</em>&nbsp;and&nbsp;<em>InformationWeek</em>&nbsp;since 1999 and a speaker at industry conferences including Comnet, PC Expo, Interop and Microsoft's TechEd since 1990. He is the author of&nbsp;<em>Networking Windows</em>&nbsp;and co-author of&nbsp;<em>Windows NT Unleashed</em>&nbsp;(Sams).</p><p>He is co-host, with Ray Lucchesi of the monthly Greybeards on Storage podcast where the voices of experience discuss the latest issues in the storage world with industry leaders.&nbsp; You can find the podcast at: http://www.deepstorage.net/NEW/GBoS

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox
More Insights