With the recent buzz around a few new primary storage deduplication products, I've seen the question of primary storage deduplication's value come up more than once. After all, if you are managing your storage correctly there shouldn't be much duplicate data, especially on primary storage right? Sure, and we all archive all of our old data to tape as soon as it has not been accessed for 90 days. Even in a well-managed system there is redundant data on primary storage, so deduplication's benefits can be enormous.
First, as I hinted to earlier, storage is growing too fast and IT staffs are too overworked to manage it all. Extra copies of data are going to sneak in. That DBA is going to keep several copies of dumps, users are going to save version 1, 2 and 3 of a file under different names and never go back and clean out the older copies. You get the picture. Then there are the more legitimate cases like the company logo that is inserted into every slide of every presentation and memo that is stored on your servers. Primary storage deduplication will catch all of these instances for you when you don't have time to.
The second area where primary storage deduplication will have a roll to play is in the storage of virtualized server and desktop images. The redundancy between these image files is very high. Primary storage deduplication will eliminate this redundancy as well, potentially saving terabytes of capacity. In many cases, the read back from deduplicated data offers little or no performance impact.
The third and potentially the biggest payoff is that deduplicating primary storage will effect optimization--copies of data, backups, snapshots and even replication jobs should all require less capacity. This does not remove the need for a secondary backup; every so often it seems like it will be a good idea to have a standalone copy of data, not tied back to any deduplication or snapshot meta data. Being able to deduplicate data earlier in the process does potentially reduce the frequency that a separate device is used, especially if the primary storage system replicates to a similarly enabled system in a DR location.
This effect makes backups merely copies of the same data. The backup application could back up to the same storage system. No need for a second one. Archives become copies of files with maybe a Write Once Read Many (WORM) flag thrown on them, but the archive application would copy that data to the same storage system.George Crump is president and founder of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. With 25 years of experience designing storage solutions for datacenters across the US, he has seen the birth of such technologies as RAID, NAS, ... View Full Bio