Careers & Certifications

08:45 PM
George Crump
George Crump
Commentary
50%
50%

De-Duplication in Primary Storage

The next frontier for de-duplication may produce the biggest disagreement

10:45 AM -- Primary storage is the next frontier for de-duplication technologies, and it may be where we see the biggest disagreement in how to best optimize this storage. At a minimum, there will be multiple approaches deployed to fully address the problem of storage growth. Remember that much of the fantastic de-duplication rates we see in backup storage has to do with the reality that most users run a full backup job every weekend and there is comparatively little change in that data between those jobs. This is not the case in primary storage.

While there is some redundant data in primary storage, it is not to the degree that there is in backups. In addition, there are other technologies that might be a better use case than de-duplication. For example, writeable snapshots can be used to make copies of databases for development work instead of actually making a copy. While some storage systems have problems handling more than a few snapshots, the number that don't is decreasing every year.

There are also higher frequencies of modified data types that dont de-duplicate well. For example image files that have been modified -- a simple example is removing red-eye from a photo. When these image files are resaved, the original is often kept. While to the human eye the two images look similar, to the de-duplication system they look different. Companies like Ocarina Networks are beginning to offer systems that are data environment specific to handle de-duplication of this type of file. If there is a lot of this type of data in the organization, an environment-specific de-duplication tool could easily be cost justified against the reduced storage requirement.

There are cases where de-dupe on primary storage makes sense. The primary target, especially by NetApp Inc. (Nasdaq: NTAP), has been the VMware image, where there is plenty of redundant data. The other area is the user home directory. NetApp is a solution here as well. Riverbed Technology Inc. (Nasdaq: RVBD) has announced plans to extend WAN de-duplication by providing in-line de-duplication of primary storage. The initial offering will also focus on user home directories. Hifn Inc. (Nasdaq: HIFN) has de-dupe on a board that can be installed into a Linux server to provide in-line de-duplication of primary storage attached to that server.

All of this data is what can best be described as semi-active; data that is not being updated frequently but is not quite ready to be archived. Reduction of active primary storage, databases, and email stores remains elusive. In-line compression solutions like those from Storwize Inc. are a viable candidate. Testing has shown compression of active Oracle databases from NFS mounts suffer no performance impact while the footprint is reduced 60 by percent or more. Interestingly, compression does not impact the de-duplication process; the de-duplication solutions are still able to de-duplicate the data in its compressed form.

George Crump is president and founder of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. With 25 years of experience designing storage solutions for datacenters across the US, he has seen the birth of such technologies as RAID, NAS, ... View Full Bio
Previous
1 of 2
Next
Comment  | 
Print  | 
More Insights
Cartoon
Slideshows
Audio Interviews
Archived Audio Interviews
Jeremy Schulman, founder of Schprockits, a network automation startup operating in stealth mode, joins us to explore whether networking professionals all need to learn programming in order to remain employed.
White Papers
Register for Network Computing Newsletters
Current Issue
Video
Twitter Feed