George Crump


Upcoming Events

Cloud Connect
Santa Clara
Feb 13-16, 2012

Cloud Connect brings together the entire cloud eco-system to better understand the transformation we're experiencing and promises to be the defining event of the cloud computing industry. Learn about the latest cloud technologies and platforms from thought leaders in Cloud Connect’s comprehensive conference.

Register Now!

More Events »

Subscribe to Newsletter

  • Keep up with all of the latest news and analysis on the fast-moving IT industry with Network Computing newsletters.
Sign Up

Source-Side Deduplication

In our next series of entries, we will begin to look at companies that do source-side deduplication. Actually, we already looked at one: Atempo. Source-side dedupe means that the redundant data is eliminated prior to it traveling across the network to the backup server. If you draw this technique up on the whiteboard, it seems like this would be the most logical place to eliminate redundant data, but it is not without its challenges and we will try to address those as we go through them.

The advantage of source-side dedupe is that after the initial backup is complete, only the unique data is sent across the wire. This can either be done via a traditional deduplication process or it can be done via a block-level incremental backup. With this deduplication technique, a process occurs that compares changed segments of information to what has already been sent to the backup target, but that comparison is typically across all the data, from multiple sources that have been sent to that target. For example, if server A and server B had the same file, when it became server B's turn to send that file, it would not need to since server A has already sent it. Think of source-side deduplication as an enterprise-wide comparison to eliminate redundancy done prior to data transmission.

Block-level incremental (BLI) backups, after the initial backup, also only send changed segments of information. These segments, however, are typically tied to the boundaries of the blocks laid out by the file system. BLI backups tend to keep an exact mirror of the systems they are protecting at the backup target. They are typically a volume-to-volume matching technique more than a deduplication technique. Most leverage some form of a snapshot to be able to provide point-in-time rollbacks. For obvious marketing reasons, companies that offer a BLI solution want to get lumped into the deduplication category. They do eliminate the need for redundant backups and are also smaller than the classic incremental, because they are only sending and storing changed blocks instead of entire files. Finally, some of the companies will also do a post-process deduplication pass on the data to eliminate cross server redundancy.

A concern with source-side deduplication is what impact does that deduplication comparison step have on the client? All the vendors we have spoken to in preparation, and I am sure the ones we are getting ready to speak with, all claim that there is "little to no client impact." You need to test this yourself. What we can say is that the problem is not as severe as it was several years ago. The client-side software has matured and the processing resources available to that client are significantly greater than they were.

Lab tests and checks with users typically place the impact of the dedupe comparison to about 5 to 10 percent. The BLI technique, because it is a hard-set data segment and only a volume-to-volume comparison, does not require as much CPU resources. Also, many file systems will provide the requesting software a changed block list via an API. BLIs do not, however, have enterprise-wide data reduction, unless they are using a separate, post-process deduplication pass as well.


Page:  1 | 2 |Next Page »

Related Reading


More deduplication Insights



Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Network Computing encourages readers to engage in spirited, healthy debate, including taking us to task. However, Network Computing moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. Network Computing further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Data Deduplication Reports

Research and Reports

Hypervisor Derby
August 2011

Network Computing: August 2011

TechWeb Careers