01:58 PM
Howard Marks
Howard Marks
Repost This

Don't Trade High Availability for Flash Performance

The rise of SSDs has also resurrected the single-controller architecture--along with its single point of failure. IT shouldn't gamble on high availability just for flash's amped-up performance.

As flash-based SSDs revolutionize the storage industry, I thought it might be worth taking a look at how some basic storage system architectures compare when the storage media changes from spinning disks to SSDs.

The most basic storage architecture is essentially a RAID controller with a SAN or NAS target. The controller, whether custom hardware or a standard server, is a single point of failure. As a result, unicontroller systems have been relegated to the very low end of the disk array market, where they're used by SMBs or to hold additional copies of data. The vast middle of the storage market is dominated by dual-controller, modular arrays that will fail over transparently if one controller goes down.

Amazingly enough, the move to SSD has resurrected the unicontroller design in the form of rackmount SSDs from the likes of IBM's Texas Memory Systems and Astute Networks' VISX, as well as more feature-rich systems like Nimbus Data's S-Class. The risk of data loss inherent in a unicontroller design might be tolerable for some applications, such as analytics or VDI with non-persistent desktops, but for the vast majority of cases I would find it hard to pay $50,000 or more for a product that doesn't offer high availability.

When asked about high availability, proponents of unicontroller systems will generally recommend a pair of appliances with synchronous replication. If the vendor has done its homework and written a fail-over mechanism into its arrays, a cluster of unicontrollers is available enough for most applications.

Basically, a typical dual controller, active/passive modular storage system is what the systems guys would call a shared disk cluster, much like a typical Windows Server cluster. A pair of unicontroller systems that replicate data is a shared nothing cluster.

In the disk era, unicontrollers were built on industry-standard servers, which offset the additional cost of a second set of disk drives. This meant that unicontroller designs, some of which provided some degree of scale-out as well, like Lefthand's iSCSI array and NexentaStor, have sold thousands of units.

The problem with unicontroller systems in the SSD era is that the flash makes up a much higher fraction of the cost of a storage system than disk drives. In fact, some all-flash unicontroller systems cost as much as competing systems that include HA.

I've even heard vendors suggest that customers buy one flash device and manage HA by using host volume managers or storage virtualization appliances.

But if you mirror in your host computer's volume manager or synchronously replicate from an all-SSD array to a disk-based system to avoid the cost of two all-SSD systems, you give up the performance advantage the all-flash system has on writes. That's because writes will only be acknowledged to your applications after they've been written to both the flash and disk-based systems. This limits application performance to the write speed of the slower disk system.

If instead you asynchronously replicate data across the mixed storage systems through host- or application-level software, you've turned a simple device failure into a full-blown disaster with associated RPOs and RTOs. By contrast, device failure on a true high-availability system would cause no data loss, and at worst a few seconds of failover delay

Users and senior management can accept some downtime, and even some performance loss, in the face of a disaster caused by an external event like a tornado or hurricane. They're a lot less understanding when they are inconvenienced by a problem within the IT department, even if it was the failure of a key piece of equipment.

The only place I can think of where a replicating pair of unicontroller systems might be an advantage would be on a college campus. At the college where I worked, we had two data centers at opposite ends of the campus connected by a loop of 128 strands of single-mode fiber. In an environment like that, a user could put one system in each data center, and get both high availability and disaster recovery with one replicating array pair taking advantage of the lower cost of unicontroller systems.

A year or two ago, speed was the only performance factor that people cared about with all-flash systems; we were so happy with the performance we didn't care about other functionality. But as the all-flash market matures, I'm less willing to sacrifice things like high availability for speed.

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
Jasmine J. McTigue
Jasmine J. McTigue,
User Rank: Apprentice
4/2/2013 | 7:25:09 PM
re: Don't Trade High Availability for Flash Performance
Interesting article. I'm considering Virident SSD for some high performance database work, and HA is certainly an issue! What Virident's sales engineers recommended was to run two units and Datacore SanSymphony. When I asked about latency and throughput being compromised under that architecture, I didn't get a clear answer.

Needless to say, when you consider that a single MLC PCIe SSD can have performance capabilities comparable to RACKS of 15k SAS disks, engineering these things for high availability is going to be a serious concern.

Jasmine McTigue
NWC Contributor
User Rank: Apprentice
3/7/2013 | 3:13:21 PM
re: Don't Trade High Availability for Flash Performance
Howard, great post! It's really spot on. I'd love to see a future post describing the unique benefits of a scale-out architecture designed into an all flash system. Kaminario, has been delivering this, and with it, both high performance, and HA, without compromise, since 2010. And the market is responding to us. (We're busier than ever, and hiring!)
William Bodei
Director of Systems Engineering, Kaminario
More Blogs from Commentary
SDN: Waiting For The Trickle-Down Effect
Like server virtualization and 10 Gigabit Ethernet, SDN will eventually become a technology that small and midsized enterprises can use. But it's going to require some new packaging.
IT Certification Exam Success In 4 Steps
There are no shortcuts to obtaining passing scores, but focusing on key fundamentals of proper study and preparation will help you master the art of certification.
VMware's VSAN Benchmarks: Under The Hood
VMware touted flashy numbers in recently published performance benchmarks, but a closer examination of its VSAN testing shows why customers shouldn't expect the same results with their real-world applications.
Building an Information Security Policy Part 4: Addresses and Identifiers
Proper traffic identification through techniques such as IP addressing and VLANs are the foundation of a secure network.
SDN Strategies Part 4: Big Switch, Avaya, IBM,VMware
This series on SDN products concludes with a look at Big Switch's updated SDN strategy, VMware NSX, IBM's hybrid approach, and Avaya's focus on virtual network services.
Hot Topics
Converged Infrastructure: 3 Considerations
Bill Kleyman, National Director of Strategy & Innovation, MTM Technologies,  4/16/2014
White Papers
Register for Network Computing Newsletters
Current Issue
Twitter Feed