Most Of Our Benchmarks Are Broken
December 20, 2011
For years, we in the storage industry have relied on a fairly small set of benchmarks to measure the relative performance of storage systems under different conditions. As storage systems have included new technologies-- including data reduction, flash memory as cache or automated tiering--our existing portfolio of synthetic benchmarks are starting to report results that aren't directly comparable to the performance that this new generation of storage systems will deliver in the real world.
The most commonly used storage benchmark is IOmeter, originally developed by Intel and, since 2001, an open source project on SourceForge. IOmeter can perform random and sequential I/O operations of various sizes, reporting number of IOPs, throughput and latency of the system under test. IOmeter has the virtues of being free and easy to use. As a result, we’ve developed IOmeter access patterns that mix various size I/O requests and random vs. sequential access patterns to mimic file, Web and database servers.
After years of hearing application vendors tell us that the impact of storage system cache should be minimal, we adjusted our test suite to measure actual disk performance, minimizing the impact of the storage system’s RAM cache. Since RAM caches, even today, are just a few gigabytes, simply running the benchmark across a data set, or volume, at least several times the size of the cache would ensure we weren’t seeing a fast cache as a fast storage system.
Once we start testing storage systems that use flash as a cache or automated storage tier, the system will no longer provide consistent performance across the test data set. Instead, when running real applications, some portions of the data, like indexes, will be "hot" and served from flash, where other portions of the data set, like transaction logs or sales order line item records, will be accessed only once or twice. These cooler data items will be served from disk.
The problem is that when IOmeter does random IO, its IO requests are spread evenly across the volume being tested. Unlike real applications, IOmeter doesn’t create hot spots. As a result, IOmeter results won’t show as significant a performance boost from the addition of flash as real-world applications.