• 04/22/2014
    1:00 AM
  • Rating: 
    0 votes
    Vote up!
    Vote down!

VMware's VSAN Benchmarks: Under The Hood

VMware touted flashy numbers in recently published performance benchmarks, but a closer examination of its VSAN testing shows why customers shouldn't expect the same results with their real-world applications.

In addition, VMware set failures to survive to zero, so there was no data protection and, therefore, no replication latency in this test.

When I pointed out on Twitter that no one should run VSAN with failures to tolerate (FTT) set to zero, the response was that some applications, like SQL Server with Always On, do their own data protection. That is, of course, true, but those applications are designed to run directly on DAS, and if you’re not getting shared, protected data, why spend $5,000 per server for VSAN? That would let you buy a much bigger SSD and you could just use it directly.

The good news
While I don’t think this benchmark has any resemblance to a real-world application load, we did learn a couple of positive things about VSAN from it.

First, the fact VMware could reach 2 million IOPS is impressive on its face, even if it took tweaking the configuration to absurd levels. It’s more significant as an indication of how efficient VSAN’s cache is at handling read I/Os. Intel DC S3700 SSDs are rated at 75,000 read IOPS, so the pool of 32 SSDs could theoretically deliver 2.4 million IOPS. That VMware managed to layer VSAN on top and still deliver 2 million is a good sign.

The second is how linearly VSAN performance scaled from 253 K IOPS with four nodes to 2 million with 32. Of course, most of the I/O was local from a VM to its host’s SSD, but any time a scale-out storage system can be over 90 percent linear in its scaling, I’m impressed.

The read-write benchmark
VMware also released the data for a more realistic read-write benchmark. This test used a 70 percent read / 30 percent write workload of 4K IOPS. While 70/30 is a little read-intensive for me -- most OLTP apps are closer to 60/40, and VDI workloads are more than 50 percent write -- I’m not going to quibble about it. Also more realistic is the failures to tolerate, which was now set to 1.

As previously mentioned, I think FTT=1 is too little protection for production apps, although it would be fine for dev and test. Using FTT=2 would increase the amount of replication performed for each write, which should reduce the total IOPS somewhat.

Again, VMware used a single workload with a very small dataset relative to the amount of flash in the system, let alone the total amount of storage. In this case, each host ran one VM that accessed 32 GB of data. Running against less than 10 percent of the flash meant not only that all the I/Os are to SSD but that the SSDs always have plenty of free space to use for newly written data.

Again, the system was quite linear in performance, delivering 80,000 IOPS with four nodes and 640,000 with 32 nodes, or 20,000 IOPS per node. Average latency was around 3 ms, so better than any spinning disk could deliver, but a bit higher than I’d like for all I/Os from SSD.

Twenty thousand IOPS per node is respectable, but considering that the VM was tuned using multiple paravirtualized SCSI adapters and that, once again, most of the I/O was to the local SSD, not that impressive.

The benchmark I'd like to see
As a rule of thumb, I like to see hybrid storage systems deliver 90 percent or more of the system’s random I/Os from flash. For most applications that do random I/Os, that means the system has to have about 10 percent of its usable storage as flash.

Even with IOmeter’s limitations as a benchmark -- primarily that it doesn’t ordinarily create hotspots but spreads data across the entire dataset under test -- VMware could have come up with a more realistic benchmark to run.

I would like to see, say, 20 VMs per host accessing a total of 80 percent of the total useable capacity of the system. The VMs would use varying size virtual disks, with some VMs accessing large virtual disks with throttles on the number of I/Os they generate by way of the number of outstanding I/Os and burst/delay settings on IOmeter.

A simpler test would be to simply use a 60/40 read/write mix over a dataset 120 percent to 200 percent the size of the flash in the system. Even that would come closer to showing the performance VSAN can deliver in the real world than testing against 10 percent of the flash.



Details matter

Howard's analysis shows how details really matter. It's so easy for vendors to tout great results, but you have to take a close look at what produced those results. 

Re: Details matter

Marcia, you're right -- customers need to read between the lines when evaluating any type of vendor claims. But I'm not sure that anyone really takes these banchmarks seriously, anyway. Readers, what do you think?

Benchmarks will be benchmarks, no matter how faint the resemb...

I like Howard Marks' sense of realism as he inspects a vendor's benchmark. "VMware could have come up with a more realistic benchmark to run," goes without saying. So could Oracle and a dozen other vendors when it comes to benchmarking. In may experience, the benchmark is optimized to make the vendor look good, no matter how faint the resemblance to the customer's planned use of the product.

Re: Benchmarks will be benchmarks, no matter how faint the re...

The sad truth is that for vendors benchmark results are frequently about marketing goals not educating the market.  Announcing that you've hit the 2 or 20 megaIOP level before any of your competitors will get you some space on websites and blogs.  Even better it gives your sales force, and fanbois, something to brag, tweet and concact customers about.


I just wish they would also publish some benchmarks that are closer to the way customers might actually use their products.



Re: Benchmarks will be benchmarks, no matter how faint the re...

It seems like it matters less that the benchmark be completely real-world than that all similar products operate with comparable benchmarks, so that IT can compare apples to apples. Could a company get hold of (or replicate) these benchmarks to run them on their own setups with all short-listed products?

Re: Benchmarks will be benchmarks, no matter how faint the re...

I agree that vendor benchmarks are going to be slanted toward the vendor. That's where independent testers come in of course. 

Maybe it's time for a venue that encourages real life benchmarks from real life end users. We used to have that on the bogomips level for Linux but that was even more useless than typical vendor benchmarks. :) 

VMware Benchmarks

For many years, VMware has promulgated its own benchmarks. At times, these have been used by the larger industry, especially in the early days of benchmarking server loads. I remember tracking them when I was benchmarking servers at InfoWorld. Although we never ran VMware's benchmarks (we tended to prefer those from SPEC, which were consortium-created), we certainly heard from server vendors about their discomfort when being rated by VMware. It was difficult to know whether this was just standard vendor griping or whether the benchmarks did indeed disfavor certain designs, but my sense was that it was the latter. I don't think this was intentionally done by VMware. Rather it illustrates the difficulty of creating useful benchmarks that can be run on multiple architectures and deliver results that can be compared fairly.