Storage

08:00 AM
George Crump
George Crump
Commentary
50%
50%
Repost This

Primary Storage Deduplication: NetApp

One of the first entrants into primary storage deduplication market was NetApp, with their Advanced Single Instance Storage (A-SIS, commonly known as NetApp deduplication). To my knowledge, NetApp was first to provide deduplication of active storage as opposed to data that had been previously stored. NetApp deduplication has certainly gained traction within the NetApp customer base, recently claiming that more than 87,000 deduped storage systems have been deployed with about 12,000 customers ben

One of the first entrants into primary storage deduplication market was NetApp, with their Advanced Single Instance Storage (A-SIS, commonly known as NetApp deduplication). To my knowledge, NetApp was first to provide deduplication of active storage as opposed to data that had been previously stored. NetApp deduplication has certainly gained traction within the NetApp customer base, recently claiming that more than 87,000 deduped storage systems have been deployed with about 12,000 customers benefiting from its storage efficiency technology.

NetApp deduplication is somewhat unique in that deduplication is really part of a vertically integrated stack of software based on their OS, Data ONTAP, and their file system Write Anywhere File Layout (WAFL). WAFL, like any other file system, uses a series of inodes and pointers commonly called extents to manage the information that the file system holds. Everything that is stored on a NetApp system is stored as a file whether it is actual file data or a blob that is presenting itself as an iSCSI or FC LUN. All these files are broken down into blocks or chunks of data, and in the WAFL file system all of the blocks are 4k in size.

As a result, each time a file is stored, its blocks are associated with a system of pointers. They leverage these 4k chunks to implement technology like snapshots and cloning. NetApp deduplication is enabled at the volume level. When a volume is enabled, the system begins an inline process of gathering fingerprints for each of these 4k chunks via a proprietary deduplication hashing algorithm. At intervals, either specified by the user or automatically triggered by data growth rates, a post-processing routine kicks in to determine any match in fingerprints, meaning that redundant data has been found.  

After a byte-level validation check confirms identical data, the pointer to the redundant block is updated to point back to the original block, and the block that has been identified as redundant is released in the same way a block attached to an expired snapshot is released. The fingerprint itself leverages existing NetApp code "write block checksum," which WAFL has used since its inception. The bottom line is that NetApp should be commended for leveraging the capabilities of its existing operating system to deliver a modern capability.

There is a two-step process to adding deduplication, total time of which should, according to NetApp and in our personal experience, take about 10 minutes. The first step is to enable deduplication by installing the license. NetApp still does not charge for deduplication, so enabling the license is mostly a reporting function to let NetApp know who is using the feature. Once the license is enabled, there is no change in the behavior of the box, it just allows the system to execute the various deduplication commands.

Previous
1 of 3
Next
Comment  | 
Print  | 
More Insights
More Blogs from Commentary
Edge Devices Are The Brains Of The Network
In any type of network, the edge is where all the action takes place. Think of the edge as the brains of the network, while the core is just the dumb muscle.
SDN: Waiting For The Trickle-Down Effect
Like server virtualization and 10 Gigabit Ethernet, SDN will eventually become a technology that small and midsized enterprises can use. But it's going to require some new packaging.
IT Certification Exam Success In 4 Steps
There are no shortcuts to obtaining passing scores, but focusing on key fundamentals of proper study and preparation will help you master the art of certification.
VMware's VSAN Benchmarks: Under The Hood
VMware touted flashy numbers in recently published performance benchmarks, but a closer examination of its VSAN testing shows why customers shouldn't expect the same results with their real-world applications.
Building an Information Security Policy Part 4: Addresses and Identifiers
Proper traffic identification through techniques such as IP addressing and VLANs are the foundation of a secure network.
Hot Topics
White Papers
Register for Network Computing Newsletters
Cartoon
Current Issue
Video
Slideshows
Twitter Feed