A few of my fellow storage pundits have lately taken to predicting the arrival of the all-flash hyper-converged system.
At VMworld this year, Micron was demonstrating an all-flash VSAN cluster. The company crammed a pair of Dell R610 servers with 12 core processors, 768 GB of RAM and two tiers of SSDs, a pair of 1.4 TB P420 PCIe cards to hold VSAN's read and write caches, and 10 960 GB M500s as the "bulk storage" tier. And while some technologies -- like flash and data deduplication -- go together like champagne and caviar, an all-flash hyper-converged system looks more like a chocolate-covered pickle to me.
I understand the logic. The hyper-converged market is booming. In just the past month, Nutanix and Simplivity have made OEM deals with Dell and Cisco respectively; Nutanix closed a $140 million funding round giving them a $2 billion valuation; and VMware's EVO:RAIL OEM program is launching VSAN-based hyper-converged systems in a 2u, four server configuration.
A plethora of vendors from Quanta and Supermicro to Dell have jumped on the EVO:RAIL bandwagon. Even more significantly, VMware parent EMC is also on board. EMC's EVO:RAIL systems will be the storage giant's first entrance into the compute market since they shut down Data General's AViiON shortly after acquiring the company for its Clariion disk arrays at the end of the last century.
The all-flash hyper-converged camp figures that putting the two shiniest technologies in the storage business together creates the glowing solution that vendors have promised since a serpent slithered down the streets of Eden, yelling, "Fresh Fruit, get your fresh fruit!"
I'm thinking it will be like the Datsun 240Z with a transplanted Chevy V-8 that was my college roommate's pride and joy. It will go real fast, but the trade-offs are so severe that most folks would be better off with a more mainstream solution.
My main concern is the sheer amount of compute horsepower it takes to manage an all-flash storage system that's delivering hundreds of thousands or millions of IOPS. Not only does processing each IO operation take thousands of instructions for data protection (and for disk writes the additional overhead synchronously replicating the data to other nodes), but flash also requires more care and feeding than conventional disks.
This care and feeding includes system-wide wear leveling to prevent one of the system's SSDs from wearing out long before the others, and system-wide garbage collection to reclaim space occupied by deleted data. Using all that CPU power to run the storage function leaves less available for running VMs, therefore boosting the total cost of the solution.
The economics of an all-flash hyper-converged system are also challenged by the fact that these systems aren't very capacity efficient. Members of the Mystic Order of Steely-eyed Storage Guys, like cloud providers everywhere, would insist on storing three replicas of their data. That means that for every TB of data, you'll have to buy 3 TB of flash. Sysadmins that think street luge is a good way to spend the weekend might risk running with just two copies, but that still means buying 2 TB of flash for every TB of data. An all-flash array with an NVRAM cache, along with an effective log-based data structure that aggregates small random writes into larger sequential ones, could use a dual-parity data protection scheme with under 25% overhead.
Sure we could make up for the high data protection overhead by adding data deduplication and compression to the storage software in a hyper-converged system, but that would increase the CPU load from the storage process. Inline deduplication has to compare the hash generated with each new block of data written to the hashes of all the data already in the system, and complete that process in a few milliseconds. So the deduplication engine has to keep the hash/index table in memory, and the more data that's stored, the bigger the table grows.
The truth is that all-flash array vendors determine how much flash to put in their systems based on the ability of their controllers -- which after all are based on the same Xeon processors as the hyper-converged systems -- to manage the flash. Once the controller CPU is fully occupied, adding more SSDs would be an expensive way to add capacity without adding any more capacity.
I find it hard to believe that some server SAN software, with the overhead of running under a hypervisor that can pull its CPU resources for another task at any time, could really be so efficient that it can deliver the same performance as an XtremIO or SolidFire node that has full control of the processor while leaving enough processor power left over to run a reasonable number of VMs.
It makes a lot of sense to combine a moderate compute workload (which would leave the server with cycles left over) and a moderate storage load into a hyper-converged brick. However, once you're looking for the consistent sub-millisecond latency and hundreds of thousands of IOPS that are the domain of all solid state systems, the compromises inherent in the hyper-converged model make a dedicated storage solution a better idea.