• 05/28/2008
    7:55 PM
  • Network Computing
  • News
  • Connect Directly
  • Rating: 
    0 votes
    Vote up!
    Vote down!

A Data De-Duplication Survival Guide: Part 1

In the first installment of this series, we discuss deciding where to de-duplicate data
Editor's note: This is the first installment of a four-part series that will examine the technology and implementation strategies deployed in data de-duplication solutions:
  • Part 1 will look at the basic location of de-duplication -- standalone device, VTL solution, or host software.
  • Part 2 will discuss the timing of de-duplication. This refers to the in-line versus post-processing debate.
  • Part 3 will cover unified versus siloed de-duplication, exploring the advantages of using a single supplier with the same solution covering all secondary data, versus deploying unique de-duplication for various types of data.
  • Part 4 will discuss performance issues. Many de-duplication suppliers claim incredibly high-speed ingestion rates for their systems, and we'll explore how to decipher the claims.

The original products for the data de-duplication market were based on specific systems that focused on improving the value of disk-to-disk backup solutions while providing organizations the ability to minimize their reliance on tape.

As data de-duplication solutions have become more prevalent, a few primary storage suppliers have attempted to implement the technology as an add-on feature, most notably in their VTLs. Backup software vendors are also adding the capability to their solutions. With so many data de-duplication options available to the IT manager today, the new question is, Where is the best place to host the data de-duplication process?

As you are reading, keep in mind that the primary focus of data de-duplication is secondary storage -- archive and backup, as opposed to primary storage. Also note that what constitutes duplicate data may not be immediately obvious. An Oracle database, for example, can be backed up in several ways -- using the built-in RMAN utility; using an organizations enterprise backup software application; or using an Oracle-specific backup utility. Each of these methods creates its own data set. Since those data sets are backups of the same Oracle database, the data within each set is essentially identical.

To Page 2

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.

Log in or Register to post comments