Upcoming Events

Executive conference

Cloud Connect March 16-18

Comprehensive thought leadership for executives, IT professionals and developers. Topics include: the ROI, cost and economics of on-demand computing; Migration strategies to move from on-premise to cloud-based IT; Vertical cloud specialization, tailoring features and architectures to specific applications, industries, and customer ecosystems

More Events »

Subscribe to Newsletter

  • Keep up with all of the latest news and analysis on the fast-moving IT industry with Network Computing newsletters.
Sign Up

A Data De-Duplication Survival Guide: Part 2

Tags:

Channel: Data Center

Editor's note: This is the second installment of a four-part series that will examine the technology and implementation strategies deployed in data de-duplication solutions:

  • Part 1 looked at the basic location of de-duplication -- standalone device, VTL solution, or host software.
  • Part 2 discusses the timing of de-duplication. This refers to the inline versus post-processing debate.
  • Part 3 will cover unified versus siloed de-duplication, exploring the advantages of using a single supplier with the same solution covering all secondary data, versus deploying unique de-duplication for various types of data.
  • Part 4 will discuss performance issues. Many de-duplication suppliers claim incredibly high-speed ingestion rates for their systems, and we'll explore how to decipher the claims.

The Timing Issue

One of the hottest debates in data de-duplication is when should the process be done? Should it be done inline as data is being ingested or as part of a post-process after the initial send of the backup job completes?

Although a more detailed explanation of de-duplication is given in my first article, as a quick reminder, de-duplication is a process that compares an incoming data stream to previously stored data, identifies redundant sub-file segments of information, and only stores the unique segments. In a backup this is particularly valuable since much of the data in is identical, especially from full backup to full backup.

There are basically three whens” of deduplicating data: inline, post-process, or a combination of the two.

When a product claims it is de-duplicating data inline, that typically means that as the appliance is receiving data, redundant data is identified, a pointer is established, and only the unique data is written to disk -- the duplicate data is never written to disk. Post-process data means that all of the data is first written to disk in its native form, then a separate, sequential process analyzes that data, and the duplicate data is eliminated. Some vendors offer a variant of post process de-duplication that employs buffers to allow the de-duplication process to start before the entire backup is completely ingested.

Page:   1   2   3   4   5   6  Next  »

Add Your Comment:

  Sponsored Links

Premium Content

Next Generation Data Center, Delivered, November 17th
NWC


Salary

Video