08:42 AM
Howard Marks
Howard Marks
Repost This

Most Of Our Benchmarks Are Broken

For years, we in the storage industry have relied on a fairly small set of benchmarks to measure the relative performance of storage systems under different conditions. As storage systems have included new technologies-- including data reduction, flash memory as cache or automated tiering--our existing portfolio of synthetic benchmarks are starting to report results that aren't directly comparable to the performance that this new generation of storage systems will deliver in the real world.

For years, we in the storage industry have relied on a fairly small set of benchmarks to measure the relative performance of storage systems under different conditions. As storage systems have included new technologies-- including data reduction, flash memory as cache or automated tiering--our existing portfolio of synthetic benchmarks are starting to report results that aren't directly comparable to the performance that this new generation of storage systems will deliver in the real world.

The most commonly used storage benchmark is IOmeter, originally developed by Intel and, since 2001, an open source project on SourceForge. IOmeter can perform random and sequential I/O operations of various sizes, reporting number of IOPs, throughput and latency of the system under test. IOmeter has the virtues of being free and easy to use. As a result, we’ve developed IOmeter access patterns that mix various size I/O requests and random vs. sequential access patterns to mimic file, Web and database servers.

After years of hearing application vendors tell us that the impact of storage system cache should be minimal, we adjusted our test suite to measure actual disk performance, minimizing the impact of the storage system’s RAM cache. Since RAM caches, even today, are just a few gigabytes, simply running the benchmark across a data set, or volume, at least several times the size of the cache would ensure we weren’t seeing a fast cache as a fast storage system.

Once we start testing storage systems that use flash as a cache or automated storage tier, the system will no longer provide consistent performance across the test data set. Instead, when running real applications, some portions of the data, like indexes, will be "hot" and served from flash, where other portions of the data set, like transaction logs or sales order line item records, will be accessed only once or twice. These cooler data items will be served from disk.

The problem is that when IOmeter does random IO, its IO requests are spread evenly across the volume being tested. Unlike real applications, IOmeter doesn’t create hot spots. As a result, IOmeter results won’t show as significant a performance boost from the addition of flash as real-world applications.

1 of 2
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
User Rank: Apprentice
4/8/2014 | 4:58:28 AM
re: Most Of Our Benchmarks Are Broken
You're right, Howard. Benchmarking storage is a difficult,
demanding task. With modern storage systems that perform deduplication, compression or both at high data rates, it's become even harder to compare technologies and vendor implementations. It's not enough to test I/O without metadata, whether SAN index hot spots or NAS file metadata. It's also not enough to test with limited content sets. We need to be able to stress the full storage system performance envelope in realistic ways.

While it can be difficult to test using complex and widely
varied content, it's critical today. We will have limited access to real-world application data for the foreseeable future, but we can and should model as many real-world applications as possible. And in the meantime, we need meaningful synthetic benchmarks and tests that more closely match real applications.

There is another approach available today. Synthetic data
can be used to populate and then test storage systems using
combinations of non-repeating, repeating, compressible and
non-compressible content patterns. Our company, Load DynamiX, has introduced the capability to generate random, sequential and compressible data patterns at near line-rate, and then combine these data types to populate and then test a storage system in a way that more closely represents real application patterns and application traffic. By combining these patterns and access, we can validate whether deduplication works today and in the future and that compression produces the expected reduction in storage requirements. we can verify whether or not a storage system can maintain performance when these technologies are included and can test using random access to ensure worst-case performance meets users needs. And we can do this using traffic patterns that emulate metadata and data access simultaneously.

Testing using real-world application data and access
patterns is critical, and something we at Load DynamiX work on every day. We're at the frontier of a new testing model that will
add much value for customers. We look forward to updating you
on our latest capabilities and introducing you to our new
modeling interface!
User Rank: Apprentice
12/21/2011 | 6:49:06 PM
re: Most Of Our Benchmarks Are Broken

Agree that VDbench is a great tool. I have used it for work I have done, and in fact it is the main workload driver behind things like SPC-1 and SPC-2, as well as others.

Additionally, it can also replay actual application traces, to recreate real workloads, rather than synthetic estimations of applications.

Some simple applications can be estimated and approximated synthetically using VDbench, such as SQL server access with a specific block size, etc.

However, for some complex applications like VDI and others, synthetic approximations are insufficient.

Mike Fratto
Mike Fratto,
User Rank: Apprentice
12/21/2011 | 2:03:58 PM
re: Most Of Our Benchmarks Are Broken
Dave, I think posting some of your internal test profiles would be great. One the issues that Howard and I have wrestled with over the years is how to get "real-world" profiles for all kinds of equipment. They are all approximations, but if they are consistently used, then they can be useful for testing and comparison.

User Rank: Apprentice
12/21/2011 | 1:31:29 AM
re: Most Of Our Benchmarks Are Broken
Benchmarking modern storage system is a difficult problem and getting harder all the time. Most storage vendors (including us) have had to develop our own internal tools to do a good job of thoroughly testing different workloads in real-world scenarios.

That said, one of the best freely available tools today is VDbench, and open source tool currently maintained by Oracle and created by the developer of the SPC workload generator.

VDbench has a couple key advantages over a tool like IOMeter:
1. Massive configurability of the workload, allowing you to simulate any IO pattern you'd like including things like disk areas that are hotter than others.
2. Control over the data stream that is written, including compressability and dedupability. While testing with highly compressable data isn't realistic, testing with uncompressable data isn't a real world scenario for most applications either. Being able to set an appropriate mix is great.
3. It's distributed, meaning you can run workload generators across dozens of clients and control them from a central point. Testing a storage system with point workloads is worthless these days - you need to throw multiple workloads from multiple initiators at the system to really see how it will stand up in production.

While not without its faults, it's probably the best open source tool available today.
We're thinking about open sourcing some of the "workload profiles" we've created for VDbench, and working with other members of the storage community (analysts, press, vendors) to create a common catalog of workload profiles that can be shared. Would be great to talk to you about this at some point.

User Rank: Apprentice
12/20/2011 | 5:37:02 PM
re: Most Of Our Benchmarks Are Broken

Overall a good, thought provoking article. I agree that many unsophisticated benchmarks are not able to mimic the sophisticated access patterns of real applications.

Your call for "A good benchmark" is difficult, since every application inherently behaves differently.

We have developed a benchmark that exactly mimics a real application. In this case, the application was a mix of desktop applications used in virtual desktop settings, known as VDI.

VDI-IOmark represents a set of real applications and provides valid benchmark results for this particular application mix.

So in answer to your call, VDI-IOmark does provide real-world data patterns for VDI workloads. Developing additional workloads that represent other applications may be an avenue for further development.

More Blogs from Commentary
SDN: Waiting For The Trickle-Down Effect
Like server virtualization and 10 Gigabit Ethernet, SDN will eventually become a technology that small and midsized enterprises can use. But it's going to require some new packaging.
IT Certification Exam Success In 4 Steps
There are no shortcuts to obtaining passing scores, but focusing on key fundamentals of proper study and preparation will help you master the art of certification.
VMware's VSAN Benchmarks: Under The Hood
VMware touted flashy numbers in recently published performance benchmarks, but a closer examination of its VSAN testing shows why customers shouldn't expect the same results with their real-world applications.
Building an Information Security Policy Part 4: Addresses and Identifiers
Proper traffic identification through techniques such as IP addressing and VLANs are the foundation of a secure network.
SDN Strategies Part 4: Big Switch, Avaya, IBM,VMware
This series on SDN products concludes with a look at Big Switch's updated SDN strategy, VMware NSX, IBM's hybrid approach, and Avaya's focus on virtual network services.
Hot Topics
Converged Infrastructure: 3 Considerations
Bill Kleyman, National Director of Strategy & Innovation, MTM Technologies,  4/16/2014
White Papers
Register for Network Computing Newsletters
Current Issue
Twitter Feed