Howard Marks

Network Computing Blogger


Upcoming Events

Cloud Connect
Santa Clara
Feb 13-16, 2012

Cloud Connect brings together the entire cloud eco-system to better understand the transformation we're experiencing and promises to be the defining event of the cloud computing industry. Learn about the latest cloud technologies and platforms from thought leaders in Cloud Connect’s comprehensive conference.

Register Now!

More Events »

Subscribe to Newsletter

  • Keep up with all of the latest news and analysis on the fast-moving IT industry with Network Computing newsletters.
Sign Up

Dedupe Ratios Do Matter

A recent blog post by CommVault's Dipesh Patel has reopened the argument over the value of higher deduplication ratio. He, like others that claims that dedupe ratios don't matter, and points out that a dedupe ratio of 10:1 reduces the size of your data by 90 percent, reaching the zone of diminishing returns since a 20:1 dedupe ratio will only get you another 5 percent reduction. To me that argument sounds an awful lot like Lucy Ricardo explaining how she saved a lot of money on the new living room furniture because it was on sale.

I tend to think more like Ricky and worry not about how much was saved but how much the new living room suite, or backup solution, cost. A system that gets 20:1 is twice as good at data reduction as one that gets 10:1 not 5 percent better. It's the absolute space needed not the incremental savings that we shell out our hard-earned budget dollars for.

If I have 500TB of backup data I want to keep online, I'll need 50TB of usable space in a system that reduces data 10:1 solution but just 25TB if it got 20:1. Assuming 1TB drives and 10+2 RAID-6 that would be 60 1TB drives for the system that gets 10:1 but just 30 for the 20:1 system.  Even if the initial purchase price of the two solutions was the same, I'd still need to pay more to rack, maintain, power and cool twice as many disk drives Things get really interesting when we start replicating the data. Dedupe twice as effectively and you've reduced the amount of data to replicate in half. That could easily be the difference between meeting an SLA to replicate backups off-site in 24 hours with one T-3 line or having to pony up for a pair of T-3s.

Even if they can't accept my logic, which I admit may not be as solid as Spock's, vendor spokespeople like Mr. Patel should avoid making arguments that metrics like dedupe ratios don't matter because some readers will assume they're trying to downplay a weakness in their product. There's scarce data available on how well various products dedupe. Add in that deduplication rates can vary greatly as the amount of duplicate data varies from data set to another and one can understand why users are skeptical about dedupe ratio claims and counterclaims.

In addition to Mr. Patel I'd like to thank Curtis Preston (Blog Entry 1, Blog Entry 2) and Sepaton's Jay Livens (Blog Entry) for adding to the discussion in their blog posts.

Related Reading


More deduplication Insights



Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Network Computing encourages readers to engage in spirited, healthy debate, including taking us to task. However, Network Computing moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. Network Computing further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Data Deduplication Reports

Research and Reports

Hypervisor Derby
August 2011

Network Computing: August 2011

TechWeb Careers