Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Notes on a Nine Year Study of File System and Storage Benchmarking

Driven by a general sense that benchmarking practices in the areas of file and storage systems are lacking, we conducted an extensive survey of the benchmarks that were published in relevant conference papers in recent years.  We decided to evaluate the evaluators, if you will.  Our May 2008 ACM Transactions on Storage article, entitled "A Nine Year Study of File System and Storage Benchmarking'", surveyed 415 file system and storage benchmarks from 106 papers that were published in four highly-regarded conferences (SOSP, OSDI, USENIX, and FAST) between 1999 and 2007. 

Our suspicions were confirmed.  We found that most popular benchmarks are flawed, and many research papers used poor benchmarking practices and did not provide a clear indication of the system's true performance.  We evaluated benchmarks qualitatively as well as quantitatively: we conducted a set of experiments to show how some widely used benchmarks can conceal or overemphasize overheads.  Finally, we provided a set of guidelines that we hope will improve future performance evaluations.  An updated version of the guidelines is available

Benchmarks are most often used to provide an idea of how fast some piece of software or hardware runs.  The results can significantly add to, or detract from, the value of a product (be it monetary or otherwise).  For example, they may be used by potential consumers in purchasing decisions, or by researchers to help determine a system's worth.  

Systems benchmarking is a difficult task, and many of the lessons learned from this article are general enough that they can be applied to other system fields.  However, file and storage systems have special properties. Complex interactions between I/O devices, caches, kernel daemons, and other OS components result in behavior that is rather difficult to analyze. Moreover, systems have different features and optimizations, so no single benchmark is always suitable.  Lastly, the large variety of workloads that these systems experience in the real world also adds to this difficulty. 

When a performance evaluation of a system is presented, the results and implications must be clear to the reader.  This must include accurate depictions of behavior under realistic workloads and in worst-case scenarios, as well as explaining the reasoning behind benchmarking methodologies.  In addition, the reader should be able to verify the benchmark results, and compare the performance of one system with that of another.  To accomplish these goals, much thought must go into choosing suitable benchmarks and configurations, and accurate results must be properly conveyed. 

  • 1