Deduplication, even primary storage deduplication is not a brand new feature, several operating systems and NAS vendors have had the capability for a year or so but it is certainly newer than the backup use case. It's clear though that users are interested in the capability because of the value it can potentially bring to environments we discussed in our last entry. Deduplication as an API allows vendors to embed the technology into their existing storage source code. This not only gives the vendor a shortcut to offering what will become a must have capability but also, and maybe more importantly, more control over how that data is stored.
This control over and knowledge of the deduplication process could prove to be very valuable. Think of it the same way the Symantec's OST support changed the way backup applications interacted with disk storage devices. Once the backup application could have control over the device the process became much smoother. In the same way once the storage system has control over the deduplication process, better use of the technology may be able to occur. For example the storage system could process all data inline while there was no measurable impact on performance then shift to post process if storage I/O begins to be measurably impacted. In the same way they could possibly leverage the API to provide smarter, more efficient SAN replication than before. Not sending data that has already been sent from another site like some backup deduplication products do today.
The question for the suppliers of these API's is what is the impact to system performance and what is the complexity of the API? In other words can how long will it take to integrate the API set? The other issue is going to be the data modification impact. While an API makes it easy to turn something like deduplication on, will you be able to turn it off and what are the effects of doing so? That is going to be a critical issue.
I believe primary storage deduplication will be an expected feature on primary storage within the next one to two years as snapshots are today. If vendors can't get a primary deduplication product out within that time frame they need to be looking at an API type of solution ASAP. You don't want to be the only vendor bringing a knife to a gunfight.