Storage

08:00 AM
George Crump
George Crump
Commentary
50%
50%

Primary Storage Deduplication: NetApp

One of the first entrants into primary storage deduplication market was NetApp, with their Advanced Single Instance Storage (A-SIS, commonly known as NetApp deduplication). To my knowledge, NetApp was first to provide deduplication of active storage as opposed to data that had been previously stored. NetApp deduplication has certainly gained traction within the NetApp customer base, recently claiming that more than 87,000 deduped storage systems have been deployed with about 12,000 customers ben

The second step is to run deduplication on a volume-by-volume basis. It is the user's choice. This can take a while, depending on the size of the volume and the number of blocks to be analyzed. It should not be huge time issue and the process can be scheduled. NetApp provides a best practices guide on what type of workloads you should run deduplication on. Not surprisingly, these workloads are situations where the chance for redundancy is fairly high.

At the top of the list is server virtualization images and user home directories. Because the NetApp system treats LUNs as files as well as regular files, the support of virtualized environments extends beyond just NFS-based VMware images. On the home directory, there is about 30 to 35 percent savings with their deduplication product.

Home directories generally see better results with a combined compression and deduplication, which NetApp does not currently offer. The third class is mid-tier applications that are business-critical but not mission-critical like Exchange and SharePoint. As with virtualized images, there is a high chance of redundant data in these environments.

The applications that they advise users to stay away from are those that are mission-critical and have a high performance storage I/O need. NetApp admits that there is some performance overhead with their deduplication, and that you need to be careful what kind of workloads you have it handle. Most of the performance impact is the result of walking the file system and validating the duplicate data. Reading from the deduplicated pools is an extent management task for the operating system, very similar to reading from a snapshot, and imposes no significant overhead.

The two potential limitations that are up for debate is NetApp's use of deduplication as a post-process and the current lack of inclusion of compression. In both cases, NetApp will claim concerns about performance overhead vs. any potential added value to inline deduplication or adding compression.

George Crump is president and founder of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. With 25 years of experience designing storage solutions for datacenters across the US, he has seen the birth of such technologies as RAID, NAS, ... View Full Bio
Previous
2 of 3
Next
Comment  | 
Print  | 
More Insights
Cartoon
Slideshows
Audio Interviews
Archived Audio Interviews
Jeremy Schulman, founder of Schprockits, a network automation startup operating in stealth mode, joins us to explore whether networking professionals all need to learn programming in order to remain employed.
White Papers
Register for Network Computing Newsletters
Current Issue
Video
Twitter Feed