A Data De-Duplication Survival Guide: Part 1

In the first installment of this series, we discuss deciding where to de-duplicate data

May 28, 2008

Editor's note: This is the first installment of a four-part series that will examine the technology and implementation strategies deployed in data de-duplication solutions:

Part 1 will look at the basic location of de-duplication -- standalone device, VTL solution, or host software.

Part 2 will discuss the timing of de-duplication. This refers to the in-line versus post-processing debate.

Part 3 will cover unified versus siloed de-duplication, exploring the advantages of using a single supplier with the same solution covering all secondary data, versus deploying unique de-duplication for various types of data.

Part 4 will discuss performance issues. Many de-duplication suppliers claim incredibly high-speed ingestion rates for their systems, and we'll explore how to decipher the claims.

The original products for the data de-duplication market were based on specific systems that focused on improving the value of disk-to-disk backup solutions while providing organizations the ability to minimize their reliance on tape.

As data de-duplication solutions have become more prevalent, a few primary storage suppliers have attempted to implement the technology as an add-on feature, most notably in their VTLs. Backup software vendors are also adding the capability to their solutions. With so many data de-duplication options available to the IT manager today, the new question is, Where is the best place to host the data de-duplication process?

As you are reading, keep in mind that the primary focus of data de-duplication is secondary storage -- archive and backup, as opposed to primary storage. Also note that what constitutes duplicate data may not be immediately obvious. An Oracle database, for example, can be backed up in several ways -- using the built-in RMAN utility; using an organizations enterprise backup software application; or using an Oracle-specific backup utility. Each of these methods creates its own data set. Since those data sets are backups of the same Oracle database, the data within each set is essentially identical.

To Page 2

Cisco-led Big Tech Consortium Addresses the AI Skills Gap

Zeus Kerravala, Founder and Principal Analyst with ZK Research

April 09, 2024

By combining resources and expertise, the AI-Enabled ICT Workforce Consortium will offer a blueprint for how industries can adapt to an AI-dominated future.

Which Cybersecurity Practices Matter Most? The Cyber Insurance Industry Offers Data-Driven Insight

Will Teevan, CEO, Recast Software

January 26, 2024

IT leaders should heed the guidance of cybersecurity insurance providers who think businesses should prioritize security education, incident preparedness, regular internal audits, and ongoing vulnerability scanning and patching.

Network Courses and Certifications to Consider for 2024

Mary E. Shacklett, President, Transworld Data

November 23, 2023

There are many network courses that can address your present job's priorities and help you gain the needed skills to keep pace with industry changes and new technologies.

A Data De-Duplication Survival Guide: Part 1

Tags:

Recommended For You

Cisco-led Big Tech Consortium Addresses the AI Skills Gap

Which Cybersecurity Practices Matter Most? The Cyber Insurance Industry Offers Data-Driven Insight

Network Courses and Certifications to Consider for 2024

Search form

A Data De-Duplication Survival Guide: Part 1

Tags:

Recommended For You

Cisco-led Big Tech Consortium Addresses the AI Skills Gap

Which Cybersecurity Practices Matter Most? The Cyber Insurance Industry Offers Data-Driven Insight

Network Courses and Certifications to Consider for 2024