Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

A Data De-Duplication Survival Guide: Part 1

Editor's note: This is the first installment of a four-part series that will examine the technology and implementation strategies deployed in data de-duplication solutions:

  • Part 1 will look at the basic location of de-duplication -- standalone device, VTL solution, or host software.
  • Part 2 will discuss the timing of de-duplication. This refers to the in-line versus post-processing debate.
  • Part 3 will cover unified versus siloed de-duplication, exploring the advantages of using a single supplier with the same solution covering all secondary data, versus deploying unique de-duplication for various types of data.
  • Part 4 will discuss performance issues. Many de-duplication suppliers claim incredibly high-speed ingestion rates for their systems, and we'll explore how to decipher the claims.

The original products for the data de-duplication market were based on specific systems that focused on improving the value of disk-to-disk backup solutions while providing organizations the ability to minimize their reliance on tape.

As data de-duplication solutions have become more prevalent, a few primary storage suppliers have attempted to implement the technology as an add-on feature, most notably in their VTLs. Backup software vendors are also adding the capability to their solutions. With so many data de-duplication options available to the IT manager today, the new question is, Where is the best place to host the data de-duplication process?

As you are reading, keep in mind that the primary focus of data de-duplication is secondary storage -- archive and backup, as opposed to primary storage. Also note that what constitutes duplicate data may not be immediately obvious. An Oracle database, for example, can be backed up in several ways -- using the built-in RMAN utility; using an organizations enterprise backup software application; or using an Oracle-specific backup utility. Each of these methods creates its own data set. Since those data sets are backups of the same Oracle database, the data within each set is essentially identical.

To Page 2

  • 1