Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

De-Duplication in Primary Storage

10:45 AM -- Primary storage is the next frontier for de-duplication technologies, and it may be where we see the biggest disagreement in how to best optimize this storage. At a minimum, there will be multiple approaches deployed to fully address the problem of storage growth. Remember that much of the fantastic de-duplication rates we see in backup storage has to do with the reality that most users run a full backup job every weekend and there is comparatively little change in that data between those jobs. This is not the case in primary storage.

While there is some redundant data in primary storage, it is not to the degree that there is in backups. In addition, there are other technologies that might be a better use case than de-duplication. For example, writeable snapshots can be used to make copies of databases for development work instead of actually making a copy. While some storage systems have problems handling more than a few snapshots, the number that don't is decreasing every year.

There are also higher frequencies of modified data types that dont de-duplicate well. For example image files that have been modified -- a simple example is removing red-eye from a photo. When these image files are resaved, the original is often kept. While to the human eye the two images look similar, to the de-duplication system they look different. Companies like Ocarina Networks are beginning to offer systems that are data environment specific to handle de-duplication of this type of file. If there is a lot of this type of data in the organization, an environment-specific de-duplication tool could easily be cost justified against the reduced storage requirement.

There are cases where de-dupe on primary storage makes sense. The primary target, especially by NetApp Inc. (Nasdaq: NTAP), has been the VMware image, where there is plenty of redundant data. The other area is the user home directory. NetApp is a solution here as well. Riverbed Technology Inc. (Nasdaq: RVBD) has announced plans to extend WAN de-duplication by providing in-line de-duplication of primary storage. The initial offering will also focus on user home directories. Hifn Inc. (Nasdaq: HIFN) has de-dupe on a board that can be installed into a Linux server to provide in-line de-duplication of primary storage attached to that server.

All of this data is what can best be described as semi-active; data that is not being updated frequently but is not quite ready to be archived. Reduction of active primary storage, databases, and email stores remains elusive. In-line compression solutions like those from Storwize Inc. are a viable candidate. Testing has shown compression of active Oracle databases from NFS mounts suffer no performance impact while the footprint is reduced 60 by percent or more. Interestingly, compression does not impact the de-duplication process; the de-duplication solutions are still able to de-duplicate the data in its compressed form.

  • 1