Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

A Data De-Duplication Survival Guide: Part 1: Page 5 of 7

Software-based de-duplication and single instancing

As expected, backup software vendors are now adding data de-duplication abilities to their feature sets. In addition, backup software vendors like CommVault are using a data reduction technique known as single instancing, which occurs as the backup host receives the data and performs file-level comparisons.

While this method will certainly reduce some of the storage requirements caused by the backup process, it does nothing to address the network bandwidth requirements, nor does it address multiple copies of similar data (only the data that runs through the specific application will be compared for redundancy).

Single instance storage does not solve the other big problem in backup storage -- files that change slightly on a regular basis.

With single instancing, discrete files that do not change each day are typically “instanced out” of the backup. However, in any backup vaulting strategy, files that don't change are not the issue; the big files that change a little bit every day are the problem.

Databases, VMware images, and Exchange stores often change slightly throughout the day. A file-level single instance comparison would see changes as different files, not as the same file with a few changes. This means the entire file must be stored again, resulting in an anemic data reduction effect when compared to true de-duplication techniques. Clearly, without block-level reduction, there is no space savings, particularly for database files, which can be very large.