ZFS Gets Deduplication

While the financial press is speculating about how the EU's anti-trust concerns may put the kybosh on the OraSun (or is it Sunacle?) merger, Sun blogger and ZFS creator Jeff Bonwick announced this week that ZFS now includes inline deduplication. While we've been waiting since July for Sun to get their deduplication working, I'm intrigued by both the details of how ZFS dedupe works and the ramifications of including deduplication in reasonably priced server based storage solutions.

Howard Marks

November 5, 2009

2 Min Read
Network Computing logo

While the financial press is speculating about how the EU's anti-trust concerns may put the kybosh on the OraSun (or is it Sunacle?) merger, Sun blogger and ZFS creator Jeff Bonwick announced this week that ZFS now includes inline deduplication. While we've been waiting since July for Sun to get their deduplication working, I'm intrigued by both the details of how ZFS dedupe works and the ramifications of including deduplication in reasonably priced server based storage solutions.

When I first heard that Sun was going to add dedupe to ZFS, I expected something resembling NetApp dedupe formerly known as A-SIS. That is a post process, relatively low data reduction, system that would be interesting to Sun users. I've mentioned before that the enterprise NAS guys have been very conservative when adding data reduction technologies so their customers would never have a reason to think any new feature might slow their NAS box down in any way.  

Sun, on the other hand, has recognized that server CPU cycles are growing much faster than disk I/O bandwidth and have decided to use the CPU cycles available to manage storage.  This lets them design one server that can be a compute node or a storage node in the data center.

Like NetApp dedupe, ZFS leverages the per block checksums it calculates as each block is written to disk to insure data integrity to identify duplicate blocks. Admins can turn dedupe on by storage pool with a single command. They can also choose to not trust the very collision resistant SHA-256 hash algorithm and turn on byte by byte verification. Clever users could even use the less compute intensive fletcher4 checksum to identify "similar" blocks and rely on verification to insure they don't deduplicate data that isn't really duplicated in the first place.

Add in the compression that ZFS has included for years and a server running NexentaStor (or a Sun appliance), and this could be really be a general purpose storage system with good data reduction for NFS, iSCSI or even FC attached systems.  We should see a version of OpenSolaris available for download with ZFS Dedupe enabled available for download in a month or two. Not being a Solaris jockey myself, I'm going to wait for the Nexenta guys to roll it into NexentaStor and then fire it up in the lab. I'm looking forward to it.

About the Author(s)

Howard Marks

Network Computing Blogger

Howard Marks</strong>&nbsp;is founder and chief scientist at Deepstorage LLC, a storage consultancy and independent test lab based in Santa Fe, N.M. and concentrating on storage and data center networking. In more than 25 years of consulting, Marks has designed and implemented storage systems, networks, management systems and Internet strategies at organizations including American Express, J.P. Morgan, Borden Foods, U.S. Tobacco, BBDO Worldwide, Foxwoods Resort Casino and the State University of New York at Purchase. The testing at DeepStorage Labs is informed by that real world experience.</p><p>He has been a frequent contributor to <em>Network Computing</em>&nbsp;and&nbsp;<em>InformationWeek</em>&nbsp;since 1999 and a speaker at industry conferences including Comnet, PC Expo, Interop and Microsoft's TechEd since 1990. He is the author of&nbsp;<em>Networking Windows</em>&nbsp;and co-author of&nbsp;<em>Windows NT Unleashed</em>&nbsp;(Sams).</p><p>He is co-host, with Ray Lucchesi of the monthly Greybeards on Storage podcast where the voices of experience discuss the latest issues in the storage world with industry leaders.&nbsp; You can find the podcast at: http://www.deepstorage.net/NEW/GBoS

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox
More Insights