Primary Storage Deduplication: GreenBytes

I concluded in a recent blog entry "Do We Need Primary Storage Deduplication?" about how primary storage deduplication can bring significant value to the data center, and it is becoming a must for suppliers to provide. In our own testing on two different deduplication platforms, we are seeing an almost 70 percent reduction of capacity requirements on real world data sets.

George Crump

August 31, 2010

4 Min Read
Network Computing logo

I concluded in a recent blog entry "Do We Need Primary Storage Deduplication?" about how primary storage deduplication can bring significant value to the data center, and it is becoming a must for suppliers to provide. In our own testing on two different deduplication platforms, we are seeing an almost 70 percent reduction of capacity requirements on real world data sets.

Over the next couple of months we will be profiling companies that are offering primary storage deduplication. The players in this market range from companies offering deduplication as a module that storage manufacturers can integrate into their platforms to companies providing complete storage systems where deduplication just happens to be one of the capabilities and  deduplication companies that offer complete storage system capabilities (meaning that deduplication is their lead). Our first company to be profiled, GreenBytes, falls in between those last two categories. Deduplication gets a lot of attention for them, but other capabilities make the unit compelling, as well.

GreenBytes, founded in 2007, is an OpenSolaris-based storage appliance. They are not using ZFS's deduplication capabilities; instead, they have developed their own technology and have been shipping long before ZFS added deduplication to its feature set. The majority of the GreenBytes IP is not dependent on OpenSolaris, and they are not as susceptible as others to changes that Oracle could bring to the OpenSolaris community. GreenBytes IP manifests itself in its GB-X series, a high performance, SSD accelerated, inline deduplication storage system. It can provide both file services (CIFS and NFS) as well as block (via iSCSI) storage.

The typical use case for the GB-X series is to start as part of a backup evaluation, then as the user experiences the capabilities and performance of the unit, they begin to examine its application in primary storage. Early implementations include hosting VMware images, and, of course, home directories. As the comfort level with the technology increases, so do the use cases.

There are three key ingredients to the GreenBytes solution. The GreenBytes File System (GBFS), the hardware platform and the management environment. The GreenBytes file system handles all the deduplication and compression inline. Inline deduplication has an advantage of never having to store redundant data. This saves on temporal capacity allocation and can improve performance by minimizing the amount of write activity. The GreenBytes solutions also leverage solid state storage to hold deduplication meta data and to act as a cache for read and write operations.The combination delivers very good IOPS (in excess of 90K 4k Read IOPs and 15k 4k write IOPs on the mid-range GB-X series) and impressive ingestion rates in excess of 950MB per second with 10GbE. They have also done extra work to improve on OpenSolaris' CIFS support making the system more suitable for a Windows environment. The file system includes all the typical software features you would expect in a storage system like remote replication, snapshots and thin provisioning. Not expected, but welcome, is a Symantec OST plug-in if you are going to be using the unit as a target for Symantec backup applications.

The bundled hardware comes in three configurations: the GB-1000, GB-2000 and the GB-4000. The difference being physical size and supported capacity, which can range from 4TB to 216TB raw (before deduplication) and the I/O capabilities of the systems, which differ by number of network ports and their speed (1GbE or 10GbE). Pricing for the GB-1000 and 1U 4TB system is in the $10,000 price range, yet still includes the complete software compliment mentioned above. This means it is certainly affordable. All the systems use 2.5" hard drives in addition to the SSD tier. The advantage to the 2.5" drives is that they can be packaged more densely into the server, meaning less data center floor space, and they require significantly less power. This combined with the capacity footprint reduction inherent to deduplication leads to a very attractive power efficiency story, hence the "green" in GreenBytes.

The final ingredient is their Microsoft Management (MMC) plug in, that further makes the product interesting for a Windows environment. The MMC console provides the ability to manage and monitor both iSCSI volumes and CIFS/NFS file shares directly from a Windows server. It can also provide analytics of network performance in and out, so you can verify that the inline deduplication is not impacting performance. The MMC interface also facilitates a single-pane-of-glass management for multiple GreenBytes devices. This is especially useful for managing replication nodes.

GreenBytes could potentially be a single tier of storage that provides both backup repositories and--thanks in large part to its SSD integration---primary storage for NAS home directories or storage for server virtualization environments. It is one of the few solutions that we have seen that run the full range and is certainly worth consideration. 

About the Author(s)

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox
More Insights