The hardware platforms have changed, but GreenBytes is now shipping a pair of general purpose deduplicating NAS boxes that combine inline deduplication with the virtues of a new generation file system now running on industry standard hardware.

September 15, 2009

I first heard of GreenBytes about a year ago when they were showing off a prototype data deduping NAS box based on Sun Microsystem's ZFS and X4540 "Thumper" appliance. The hardware platforms have changed, but GreenBytes is now shipping a pair of general purpose deduplicating NAS boxes that combine inline deduplication with the virtues of a new generation file system now running on industry standard hardware.

The designers of the first generation of deduping products rightly focused on the backup market, where conventional products rewarded administrators who created multiple copies of their data. The backup market was low hanging fruit for deduplication, not only because of the amount of duplicate data in most organization's backup data sets, but also because backup was a batch process, allowing vendors to dedupe as a post-process.

While GreenBytes would be happy if you bought their GB-X boxes as a backup target, they're aiming for the bigger general purpose NAS market with data deduplication as a key differentiator. Current primary storage data reduction systems are either using compression alone (StoreWize) or performing deduplication as a post process (EMC, NetApp, Ocarina). Add in that EMC and NetApp's approaches are very conservative, resulting in limited data reduction, and it seems to me that there's a place for a new generation of deduping NAS boxes.

GreenByte's technology is still based on ZFS, although a breakdown in their relationship with SUN has resulted in them renaming it GBFS from the ZFS+.  Since ZFS, like WAFL, locates data across disks as arbitrarily placed blocks, it was a good framework on which to hang block level deduplication. It also means GreenBytes gets to keep the good bits of ZFS including RAID-Z and support for SSDs as log and read caches. GBFS also does LZ style compression to boost data reduction even further.

The hit on ZFS has been that it comes as part of Open Solaris an OS that only a true geek could love and manage. Acknowledging that most files are created from Windows systems, GreenBytes built an MMC based management tool that makes a GB-X system look like a Windows File server, complete with VSS Shadow Copy support so users can restore previous versions of files themselves.The current product line has two models: the single processor (Xeon 5520) GB-2000 with 24 2.5" drive bays and the dual processor GB-4000, which has 10Gig Ethernet ports and can hold 48 little drives. You can expand either with SAS attached JBODs holding 24 or 48 more drives each. Access is via CIFS, NFS and/or iSCSI so it can serve as a file repository for primary data, VMware datastore and archive storage if you don't need retention enforcement. When used as backup targets, GreenBytes claims ingest rates of 650MB/s and 950MB/s respectively.

I'm looking forward to bringing these puppies into the lab and running them through their paces. Testing backup performance and deduplication ratios, while involved, is pretty straightforward. I'm trying to figure out a good set of tests for file service performance for a deduping system. Conventional benchmarks like IOMeter are useless as they send and read the same data pattern over and over which should all be cached in a deduping system.  What kind of performance metrics would you want for a deduping NAS? Large PPT file save and open time? File Copies? Do you care about random I/O?

