Clusters & Grids & Clouds! Oh My!

As reasonable storage geeks, we should be able to come up with a definition of cloud storage that serves us as users rather than the guy who wants to sell

March 3, 2009

3 Min Read
Network Computing logo

11:15 -- Continuing my rant on how vendors torture the English language with our very own technical jargon, today I'll examine how vendors try to apply the buzzword of the day to their product and/or service. Since "cloud" is the buzzword of February 2009, having just beat out last month's winner "green," which of course replaced "change" after the election, I'll look at how we, as reasonable storage geeks, can come up with a definition of cloud storage that serves us as users rather than the guy who wants to sell something.

In fact let's agree on a whole taxonomy to describe multi-node storage systems.

The most basic item in our taxonomy is a cluster. A storage cluster is two or more devices (computers, controllers, NAS appliances, whatever you want to call them) that work together to provide access to a common pool of storage. Clusters can be active/active, where both nodes process data all the time, or active/passive with primary nodes that process requests under normal conditions and standby nodes that only deal with requests when the primary fails.

Everything from a pair of Windows servers running MSCS or EMC Clariion or NetApp FAS with dual controllers fits the simple case of cluster. NetApp GX, Isilon, and other larger sets of systems fit here too.

Storage grids expand on the cluster concept by allowing a large or very large number of nodes to form a grid. For the sake of simplicity I draw the line at 10 nodes, in part because that's where tightly coupled clusters like the aforementioned GX and its brethren start to break down.Grids can be made up of peers or can have nodes dedicated to functions like most RAIN (redundant array of independent nodes) archiving solutions, including NEC's Hydrastor or HDS's Content Archive platform do with data ingestion and data storage nodes. IBM's XIV and Parascale's systems fit my definition of grid storage.

Grids, rather than accessing a common storage back end as clusters can, put storage in the nodes and should store data in such a way as to survive not only disk but also node failures. This can be done by storing multiple copies of data on several nodes or through more complex scatter/gather or distributed parity functions.

For me, to let a vendor call its product cloud storage it has to expand on the grid concept by adding location awareness. So an online backup service that stores data in a single data center is not by my definition cloud storage. But a system like EMC's Atmos or Cleversafe that lets you define a policy to store data so it's stored in at least five copies at no less than three locations is a private cloud and a service like Nirvanix that lets you order two copies in two data centers is public cloud storage.

I know not everyone agrees with these definitions. Parascale calls their product a cloud even though it's not location aware. My colleague at InfomationWeek Andrew Conroy-Murray says it's not a cloud if it's not Internet connected. But we, as the storage community, should agree to some common definition and stomp on vendors that overreach calling their scooters motorcycles.

Howard Marks is chief scientist at Networks Are Our Lives Inc., a Hoboken, N.J.-based consultancy where he's been beating storage network systems into submission and writing about it in computer magazines since 1987. He currently writes for InformationWeek, which is published by the same company as Byte and Switch.6607

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like


More Insights