Storage QoS: The Next Frontier?
November 12, 2012
At the most recent Next Generation Storage Symposium and Storage Field Day, I was struck at how multiple vendors are betting that the next big thing in storage is going to be managing storage performance via some form of quality of service. While the industry is busy integrating flash into its storage architectures, a few vendors are considering not just how to deliver higher performance to applications, but also how to make the performance each application receives more predictable.
Storage administrators traditionally manage performance by allocating physical resources to individual workloads. An Oracle database server needing 15,000 IOPS would be assigned 100 15K RPM spindles that would, in aggregate, deliver those IOPS.
- Forrester Study: The Total Economic Impact of VMware View
- HP Newsletter with Gartner Research: Maximizing Your Infrastructure through Virtualization
The problem with this technique is that it's inefficient. Because the smallest of Seagate's latest generation of 15K-RPM drives holds 146G bytes, providing 15,000 IOPS will take a minimum of 7.5 Tbytes of disk space. But if the database was only 500 Gbytes or 600 Gbytes, most of that space would be wasted. Dedicating SSDs to the database is similarly wasteful of performance rather than capacity.
The solutions to the efficiency problem--thin provisioning, housing multiple virtual servers in the same data store and using flash as a shared cache and/or tier of storage--all share resources amongst multiple workloads, which makes the performance of multiple workloads interdependent. If a user creates an especially complex query against your Oracle database that hammers the storage system, the Exchange and Web servers that share disks with that database will be starved for I/O.
If we add a QoS mechanism to the storage system, we could assign a minimum IOP rate and priority to each workload. When total demand for storage performance exceeded the system's ability to deliver IOPs, instead of granting I/O requests on a first-come, first-served basis, the system will make sure each server gets its minimum number of IOPS, and then use any remaining headroom to deliver additional IOPs to the high-priority workloads.
On a hybrid storage system like those from NexGen and Tintri, meeting QoS targets will involve not only careful management of the system's I/O queues, but also the relative allocation of flash and spinning disks. (NexGen and Tintri were sponsors of the Next-Generation Storage Symposium.)
I must say in my days as a network admin I was a QoS skeptic. The voice-over-IP folks told us that to make IP phones work we'd need to buy expensive new switches that would make sure that the voice traffic would have a clear path through the Ethernet network. Of course, by the time senior management decided to junk the old Rolm PBX and convert to VoIP, we upgraded to Gigabit Ethernet and had so much bandwidth that there was free capacity for the VoIP traffic whether we implemented QoS or not.
Some have suggested that while QoS makes sense in a hybrid system, all-solid-state storage systems have so much performance available that QoS would be superfluous. Unfortunately, even all-SSD systems are susceptible to the noisy neighbor problem, just at a different pressure level.
Storage QoS will likely hold some appeal to cloud providers because the providers have even less control over what the virtual machines they host do than the managers of corporate data centers. A provider can easily lose customer A if it fails to deliver enough performance because customer B is thrashing the storage. SolidFire's all-SSD system is designed specifically for cloud service providers. Its QoS mechanisms include not only performance floors but also ceilings, so cloud providers can offer different levels of performance at different prices. (SolidFire was a sponsor of the symposium.)
As disks, and SSDs, keep growing, the inefficiencies of dedicating resources to workloads have become untenable. QoS controls on our storage systems, ideally implemented at the VM level, will allow us to cram more workloads onto fewer resources while still delivering an appropriate level of performance to each.
Disclaimer: My friend Stephen Foskett runs Storage Field Day and the associated symposia. Vendors pay to participate, and Stephen uses those funds to pay the travel expenses of delegates like me. Participating vendors provide the usual swag (pens, flashlights, USB thumb drives, clothing) and in the case of one vendor a 7-inch tablet. I also received a small honorarium to moderate a panel at the symposium.