Data De-Dupe & Archiving

10:15 AM -- Data de-duplication made its first real market inroads as a backup target. It provided an alternative to standard disk-to-disk backups that allowed you to retain data for a longer period of time. Backup is tailor-made for de-duplication because of the amount of data similarities in full backup jobs. But does de-duplication make sense in archiving?

As is always the case, where you will end up on this discussion depends on how you define archiving, how long you need to retain data, and what your motivation is to retain that data.

De-duplication devices in the backup market will claim 20X or more storage efficiency, but most leaders in this market are factoring a certain frequency of full backups being run. Typically, you may only achieve 4X to 6X efficiency between daily incremental jobs. On average, we tend to see about 12X to 16X storage efficiencies with a backup data de-duplication system. (In an upcoming entry we will go into detail on backup de-dupe rates.)

Archiving today has many use cases, but two of the more common motivations are getting older data off of primary storage to reduce costs or storing data to fulfill a legal or corporate governance requirement. In both cases, data is specifically placed on the device for a purpose. In both cases, these are often unique files and, as a result, the amount of commonality between the files is limited -- 2X to 4X storage efficiencies is a typical average.

There are exceptions where de-dupe efficiencies can be fairly high in archive storage. I know of several organizations that are creating an archive of their production databases every night so that they can view that data at any point in time. For example, one uses a database to track trading activity. They want the ability to backtrack any inconsistencies in trading or malicious activities within the database. While this database receives thousands of updates a day, as a percentage it does not change much on a day-to-day basis. The archive system that they're using can do sub-file level data de-duplication and, as a result, the de-duplication efficiency on that system is well over 30X.

Cisco-led Big Tech Consortium Addresses the AI Skills Gap

Zeus Kerravala, Founder and Principal Analyst with ZK Research

April 09, 2024

By combining resources and expertise, the AI-Enabled ICT Workforce Consortium will offer a blueprint for how industries can adapt to an AI-dominated future.

Which Cybersecurity Practices Matter Most? The Cyber Insurance Industry Offers Data-Driven Insight

Will Teevan, CEO, Recast Software

January 26, 2024

IT leaders should heed the guidance of cybersecurity insurance providers who think businesses should prioritize security education, incident preparedness, regular internal audits, and ongoing vulnerability scanning and patching.

Network Courses and Certifications to Consider for 2024

Mary E. Shacklett, President, Transworld Data

November 23, 2023

There are many network courses that can address your present job's priorities and help you gain the needed skills to keep pace with industry changes and new technologies.

Data De-Dupe & Archiving

Tags:

Recommended For You

Cisco-led Big Tech Consortium Addresses the AI Skills Gap

Which Cybersecurity Practices Matter Most? The Cyber Insurance Industry Offers Data-Driven Insight

Network Courses and Certifications to Consider for 2024

Search form

Data De-Dupe & Archiving

Tags:

Recommended For You

Cisco-led Big Tech Consortium Addresses the AI Skills Gap

Which Cybersecurity Practices Matter Most? The Cyber Insurance Industry Offers Data-Driven Insight

Network Courses and Certifications to Consider for 2024