More than a year ago, I wrote that I have seen the future of solid-state storage and it is scale-out. If the vendors that presented at Storage Field Day 4 earlier this month are any indication, scale-out isn’t just the future of solid-state storage, but storage in general. Eight of the ten vendors at the event had scale-out architectures of one sort or another.
I was intrigued both by how the products they presented used scale-out on systems addressing very different market segments and by the varying technologies they used to manage scale. I’ve already talked about Nimble Storage’s 'scale to fit' architecture and written about Coho Data’s SDN based approach. In this post, and the next one or two, I'll examine a few of the other products I saw at SFD4.
The most conventional of the bunch was probably Overland Storage’s SnapScale, the result of Overland’s acquisition of MaxiScale in 2010 during storage luminary Geoff Barrall’s brief stint as Overland’s resident tech genius.
A SnapScale cluster contains a minimum of three SnapScale X2 12 bay or SnapScale X4 36 bay data nodes and serves up the data stored across the cluster via SMB, NFS and/or iSCSI over a pair of Ethernet, either 1 or 10 Gbps, ports on each node. An additional pair of Ethernet ports on each node is used for a dedicated backend network for inter-node traffic, much like in EMC’s Isilon, although Isilon uses Infiniband. Unlike many scale-out systems, nodes don’t need to be fully populated, but the 2,3 or 4TB nearline SAS drives have to be added in sets of four.
[Read about a recent study by an online backup service that showed disk drives aren't as reliable as the industry says they are in "Disk Drive Failure: An Unavoidable Reality."]
Rather than use parity RAID within each node, SnapScale, like HDFS, replicates files across a peer set of two or three drives, each in a different node. Given the reliability of the commodity servers and NL-SAS drives that make up a SnapScale node, I’d stick to triple replication. Data from any given file system folder are spread across up to eight peer sets while iSCSI volumes are striped across up to 16 peer sets in 500KB stripes.
Unfortunately, the triple replication scheme, and Overland’s recommendation that there be one more hot spare drive in a cluster than there are nodes, means that SnapScale clusters only deliver about 25% of their raw capacity as usable space. I used Overland’s online calculator, as shown below, to see what the useable capacity of a three node X2 cluster would be.
As you can see, SnapScale delivers just 32.6TiB of useable space from the 36 4TB drives. This configuration sells for around $52,000 on the Web, or about $1.44 per useable GiB. A similar 3-node cluster of X4 nodes, each with 36 4TB drives would provide over 120TiB for under $1/GiB.
(Note: Strictly speaking, the Giga and Tera prefixes are defined in the System International set of units as decimal. That is 109 and 1012 respectively. The binary representations that most operating systems -- and Overland’s calculators -- display are Gibi (Gi) and Tebi (Ti). The difference between the decimal and binary representations is why 36 4TB drives total 130TiB instead of 144TB in Overland’s calculator.)
All in all, I think SnapScale is a little too conventional. Overland’s engineers should be slaving away in the back room changing from disk-based peer sets to chunklet mirroring, ala HP 3PAR or Dell Compellent, and adding some sort of flash caching, even if that means integrating a third party’s Linux caching. As it stands, I might use a large array of X4s as a backup repository or X2s as a back end for systems that had server side caching, but I don’t find them compelling as a general-purpose solution.
Gestalt IT, the producer of the Tech Field Day events, pays the travel expenses for delegates, including your humble reporter. Other than the usual corporate swag like golf shirts and travel mugs, I receive no remuneration for attending Tech Field Day events and have no obligation to write about the vendors and/or products presented. Overland Storage has been a client of DeepStorage LLC.
Howard Marks is founder and chief scientist at Deepstorage LLC, a storage consultancy and independent test lab based in Santa Fe, N.M. and concentrating on storage and data center networking. In more than 25 years of consulting, Marks has designed and implemented storage ... View Full Bio