Careers & Certifications

07:55 PM
Connect Directly
RSS
E-Mail
50%
50%

A Data De-Duplication Survival Guide: Part 1

In the first installment of this series, we discuss deciding where to de-duplicate data

Editor's note: This is the first installment of a four-part series that will examine the technology and implementation strategies deployed in data de-duplication solutions:

  • Part 1 will look at the basic location of de-duplication -- standalone device, VTL solution, or host software.
  • Part 2 will discuss the timing of de-duplication. This refers to the in-line versus post-processing debate.
  • Part 3 will cover unified versus siloed de-duplication, exploring the advantages of using a single supplier with the same solution covering all secondary data, versus deploying unique de-duplication for various types of data.
  • Part 4 will discuss performance issues. Many de-duplication suppliers claim incredibly high-speed ingestion rates for their systems, and we'll explore how to decipher the claims.

The original products for the data de-duplication market were based on specific systems that focused on improving the value of disk-to-disk backup solutions while providing organizations the ability to minimize their reliance on tape.

As data de-duplication solutions have become more prevalent, a few primary storage suppliers have attempted to implement the technology as an add-on feature, most notably in their VTLs. Backup software vendors are also adding the capability to their solutions. With so many data de-duplication options available to the IT manager today, the new question is, Where is the best place to host the data de-duplication process?

As you are reading, keep in mind that the primary focus of data de-duplication is secondary storage -- archive and backup, as opposed to primary storage. Also note that what constitutes duplicate data may not be immediately obvious. An Oracle database, for example, can be backed up in several ways -- using the built-in RMAN utility; using an organizations enterprise backup software application; or using an Oracle-specific backup utility. Each of these methods creates its own data set. Since those data sets are backups of the same Oracle database, the data within each set is essentially identical.

To Page 2

Previous
1 of 7
Next
Comment  | 
Print  | 
More Insights
Cartoon
Slideshows
Audio Interviews
Archived Audio Interviews
Jeremy Schulman, founder of Schprockits, a network automation startup operating in stealth mode, joins us to explore whether networking professionals all need to learn programming in order to remain employed.
White Papers
Register for Network Computing Newsletters
Current Issue
Video
Twitter Feed