The second step is to run deduplication on a volume-by-volume basis. It is the user's choice. This can take a while, depending on the size of the volume and the number of blocks to be analyzed. It should not be huge time issue and the process can be scheduled. NetApp provides a best practices guide on what type of workloads you should run deduplication on. Not surprisingly, these workloads are situations where the chance for redundancy is fairly high.
At the top of the list is server virtualization images and user home directories. Because the NetApp system treats LUNs as files as well as regular files, the support of virtualized environments extends beyond just NFS-based VMware images. On the home directory, there is about 30 to 35 percent savings with their deduplication product.
Home directories generally see better results with a combined compression and deduplication, which NetApp does not currently offer. The third class is mid-tier applications that are business-critical but not mission-critical like Exchange and SharePoint. As with virtualized images, there is a high chance of redundant data in these environments.
The applications that they advise users to stay away from are those that are mission-critical and have a high performance storage I/O need. NetApp admits that there is some performance overhead with their deduplication, and that you need to be careful what kind of workloads you have it handle. Most of the performance impact is the result of walking the file system and validating the duplicate data. Reading from the deduplicated pools is an extent management task for the operating system, very similar to reading from a snapshot, and imposes no significant overhead.
The two potential limitations that are up for debate is NetApp's use of deduplication as a post-process and the current lack of inclusion of compression. In both cases, NetApp will claim concerns about performance overhead vs. any potential added value to inline deduplication or adding compression.