Storage

01:00 AM
Howard Marks
Howard Marks
Commentary
Connect Directly
Twitter
RSS
E-Mail
50%
50%

VMware's VSAN Benchmarks: Under The Hood

VMware touted flashy numbers in recently published performance benchmarks, but a closer examination of its VSAN testing shows why customers shouldn't expect the same results with their real-world applications.

In addition, VMware set failures to survive to zero, so there was no data protection and, therefore, no replication latency in this test.

When I pointed out on Twitter that no one should run VSAN with failures to tolerate (FTT) set to zero, the response was that some applications, like SQL Server with Always On, do their own data protection. That is, of course, true, but those applications are designed to run directly on DAS, and if you’re not getting shared, protected data, why spend $5,000 per server for VSAN? That would let you buy a much bigger SSD and you could just use it directly.

The good news
While I don’t think this benchmark has any resemblance to a real-world application load, we did learn a couple of positive things about VSAN from it.

First, the fact VMware could reach 2 million IOPS is impressive on its face, even if it took tweaking the configuration to absurd levels. It’s more significant as an indication of how efficient VSAN’s cache is at handling read I/Os. Intel DC S3700 SSDs are rated at 75,000 read IOPS, so the pool of 32 SSDs could theoretically deliver 2.4 million IOPS. That VMware managed to layer VSAN on top and still deliver 2 million is a good sign.

The second is how linearly VSAN performance scaled from 253 K IOPS with four nodes to 2 million with 32. Of course, most of the I/O was local from a VM to its host’s SSD, but any time a scale-out storage system can be over 90 percent linear in its scaling, I’m impressed.

The read-write benchmark
VMware also released the data for a more realistic read-write benchmark. This test used a 70 percent read / 30 percent write workload of 4K IOPS. While 70/30 is a little read-intensive for me -- most OLTP apps are closer to 60/40, and VDI workloads are more than 50 percent write -- I’m not going to quibble about it. Also more realistic is the failures to tolerate, which was now set to 1.

As previously mentioned, I think FTT=1 is too little protection for production apps, although it would be fine for dev and test. Using FTT=2 would increase the amount of replication performed for each write, which should reduce the total IOPS somewhat.

Again, VMware used a single workload with a very small dataset relative to the amount of flash in the system, let alone the total amount of storage. In this case, each host ran one VM that accessed 32 GB of data. Running against less than 10 percent of the flash meant not only that all the I/Os are to SSD but that the SSDs always have plenty of free space to use for newly written data.

Again, the system was quite linear in performance, delivering 80,000 IOPS with four nodes and 640,000 with 32 nodes, or 20,000 IOPS per node. Average latency was around 3 ms, so better than any spinning disk could deliver, but a bit higher than I’d like for all I/Os from SSD.

Twenty thousand IOPS per node is respectable, but considering that the VM was tuned using multiple paravirtualized SCSI adapters and that, once again, most of the I/O was to the local SSD, not that impressive.

The benchmark I'd like to see
As a rule of thumb, I like to see hybrid storage systems deliver 90 percent or more of the system’s random I/Os from flash. For most applications that do random I/Os, that means the system has to have about 10 percent of its usable storage as flash.

Even with IOmeter’s limitations as a benchmark -- primarily that it doesn’t ordinarily create hotspots but spreads data across the entire dataset under test -- VMware could have come up with a more realistic benchmark to run.

I would like to see, say, 20 VMs per host accessing a total of 80 percent of the total useable capacity of the system. The VMs would use varying size virtual disks, with some VMs accessing large virtual disks with throttles on the number of I/Os they generate by way of the number of outstanding I/Os and burst/delay settings on IOmeter.

A simpler test would be to simply use a 60/40 read/write mix over a dataset 120 percent to 200 percent the size of the flash in the system. Even that would come closer to showing the performance VSAN can deliver in the real world than testing against 10 percent of the flash.

 

Howard Marks is founder and chief scientist at Deepstorage LLC, a storage consultancy and independent test lab based in Santa Fe, N.M. and concentrating on storage and data center networking. In more than 25 years of consulting, Marks has designed and implemented storage ... View Full Bio
Previous
2 of 2
Next
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Gallifreyan
50%
50%
Gallifreyan,
User Rank: Apprentice
5/6/2014 | 1:44:14 PM
Re: Benchmarks will be benchmarks, no matter how faint the resemblance...
I agree that vendor benchmarks are going to be slanted toward the vendor. That's where independent testers come in of course. 

Maybe it's time for a venue that encourages real life benchmarks from real life end users. We used to have that on the bogomips level for Linux but that was even more useless than typical vendor benchmarks. :) 
Lorna Garey
50%
50%
Lorna Garey,
User Rank: Ninja
4/24/2014 | 4:08:25 PM
Re: Benchmarks will be benchmarks, no matter how faint the resemblance...
It seems like it matters less that the benchmark be completely real-world than that all similar products operate with comparable benchmarks, so that IT can compare apples to apples. Could a company get hold of (or replicate) these benchmarks to run them on their own setups with all short-listed products?
Howard Marks
50%
50%
Howard Marks,
User Rank: Apprentice
4/24/2014 | 3:12:37 PM
Re: Benchmarks will be benchmarks, no matter how faint the resemblance...
The sad truth is that for vendors benchmark results are frequently about marketing goals not educating the market.  Announcing that you've hit the 2 or 20 megaIOP level before any of your competitors will get you some space on websites and blogs.  Even better it gives your sales force, and fanbois, something to brag, tweet and concact customers about.

 

I just wish they would also publish some benchmarks that are closer to the way customers might actually use their products.

 

 -Howard
Andrew Binstock
50%
50%
Andrew Binstock,
User Rank: Apprentice
4/23/2014 | 5:10:30 PM
VMware Benchmarks
For many years, VMware has promulgated its own benchmarks. At times, these have been used by the larger industry, especially in the early days of benchmarking server loads. I remember tracking them when I was benchmarking servers at InfoWorld. Although we never ran VMware's benchmarks (we tended to prefer those from SPEC, which were consortium-created), we certainly heard from server vendors about their discomfort when being rated by VMware. It was difficult to know whether this was just standard vendor griping or whether the benchmarks did indeed disfavor certain designs, but my sense was that it was the latter. I don't think this was intentionally done by VMware. Rather it illustrates the difficulty of creating useful benchmarks that can be run on multiple architectures and deliver results that can be compared fairly.
Charlie Babcock
50%
50%
Charlie Babcock,
User Rank: Apprentice
4/23/2014 | 4:57:45 PM
Benchmarks will be benchmarks, no matter how faint the resemblance...
I like Howard Marks' sense of realism as he inspects a vendor's benchmark. "VMware could have come up with a more realistic benchmark to run," goes without saying. So could Oracle and a dozen other vendors when it comes to benchmarking. In may experience, the benchmark is optimized to make the vendor look good, no matter how faint the resemblance to the customer's planned use of the product.
Susan Fogarty
50%
50%
Susan Fogarty,
User Rank: Strategist
4/23/2014 | 11:48:50 AM
Re: Details matter
Marcia, you're right -- customers need to read between the lines when evaluating any type of vendor claims. But I'm not sure that anyone really takes these banchmarks seriously, anyway. Readers, what do you think?
MarciaNWC
50%
50%
MarciaNWC,
User Rank: Strategist
4/23/2014 | 11:08:24 AM
Details matter
Howard's analysis shows how details really matter. It's so easy for vendors to tout great results, but you have to take a close look at what produced those results. 
Cartoon
Hot Topics
6
VMware Launches Hyper-Converged Infrastructure Appliance
Marcia Savage, Managing Editor, Network Computing,  8/25/2014
4
Cloud Storage Security: AWS Vs. Azure
Anna Andreeva, security quality assurance engineer, A1QA,  8/26/2014
4
The Rise Of White-Box Storage
Jim O'Reilly, Consultant,  8/27/2014
White Papers
Register for Network Computing Newsletters
Current Issue
Video
Slideshows
Twitter Feed