Primary Storage Deduplication: NetApp
September 13, 2010
One of the first entrants into primary storage deduplication market was NetApp, with their Advanced Single Instance Storage (A-SIS, commonly known as NetApp deduplication). To my knowledge, NetApp was first to provide deduplication of active storage as opposed to data that had been previously stored. NetApp deduplication has certainly gained traction within the NetApp customer base, recently claiming that more than 87,000 deduped storage systems have been deployed with about 12,000 customers benefiting from its storage efficiency technology.
NetApp deduplication is somewhat unique in that deduplication is really part of a vertically integrated stack of software based on their OS, Data ONTAP, and their file system Write Anywhere File Layout (WAFL). WAFL, like any other file system, uses a series of inodes and pointers commonly called extents to manage the information that the file system holds. Everything that is stored on a NetApp system is stored as a file whether it is actual file data or a blob that is presenting itself as an iSCSI or FC LUN. All these files are broken down into blocks or chunks of data, and in the WAFL file system all of the blocks are 4k in size.
As a result, each time a file is stored, its blocks are associated with a system of pointers. They leverage these 4k chunks to implement technology like snapshots and cloning. NetApp deduplication is enabled at the volume level. When a volume is enabled, the system begins an inline process of gathering fingerprints for each of these 4k chunks via a proprietary deduplication hashing algorithm. At intervals, either specified by the user or automatically triggered by data growth rates, a post-processing routine kicks in to determine any match in fingerprints, meaning that redundant data has been found.
After a byte-level validation check confirms identical data, the pointer to the redundant block is updated to point back to the original block, and the block that has been identified as redundant is released in the same way a block attached to an expired snapshot is released. The fingerprint itself leverages existing NetApp code "write block checksum," which WAFL has used since its inception. The bottom line is that NetApp should be commended for leveraging the capabilities of its existing operating system to deliver a modern capability.
There is a two-step process to adding deduplication, total time of which should, according to NetApp and in our personal experience, take about 10 minutes. The first step is to enable deduplication by installing the license. NetApp still does not charge for deduplication, so enabling the license is mostly a reporting function to let NetApp know who is using the feature. Once the license is enabled, there is no change in the behavior of the box, it just allows the system to execute the various deduplication commands.