Howard Marks

Network Computing Blogger


Upcoming Events

Where the Cloud Touches Down: Simplifying Data Center Infrastructure Management

Thursday, July 25, 2013
10:00 AM PT/1:00 PM ET

In most data centers, DCIM rests on a shaky foundation of manual record keeping and scattered documentation. OpManager replaces data center documentation with a single repository for data, QRCodes for asset tracking, accurate 3D mapping of asset locations, and a configuration management database (CMDB). In this webcast, sponsored by ManageEngine, you will see how a real-world datacenter mapping stored in racktables gets imported into OpManager, which then provides a 3D visualization of where assets actually are. You'll also see how the QR Code generator helps you make the link between real assets and the monitoring world, and how the layered CMDB provides a single point of view for all your configuration data.

Register Now!

A Network Computing Webinar:
SDN First Steps

Thursday, August 8, 2013
11:00 AM PT / 2:00 PM ET

This webinar will help attendees understand the overall concept of SDN and its benefits, describe the different conceptual approaches to SDN, and examine the various technologies, both proprietary and open source, that are emerging. It will also help users decide whether SDN makes sense in their environment, and outline the first steps IT can take for testing SDN technologies.

Register Now!

More Events »

Subscribe to Newsletter

  • Keep up with all of the latest news and analysis on the fast-moving IT industry with Network Computing newsletters.
Sign Up

See more from this blogger

Of IOPS and RAID: How Raid Affects Application Performance

After the flood of storage news generated at or around VMworld, it's time to return to our discussion of storage performance. In the first two installments, we looked at basic storage performance metrics and how Input/Output Operations Per Second (IOPs) and storage system latency have a greater impact on the performance of applications like databases than simple system throughput. In this final installment, we're going to take a look at how RAID affects the performance of your applications.

In general, RAID increases reliability and availability of your storage system. The redundancy that it provides comes at a cost, however--not just in additional disk space consumed, but also in the increased amount of work that your disk drives, spinning or solid state, have to do when you write data to a RAID set. The good news is this write amplification--or, as some call it, write penalty--can be offset by the boost in read performance reading from multiple drives in parallel. Let's look at how a theoretical RAID controller behaves when reading and writing to some common RAID configurations so we can see the impact of RAID on performance:

More Insights

Webcasts

More >>

White Papers

More >>

Reports

More >>

In a mirrored configuration like RAID 1 or RAID 10, the controller duplicates all writes to a pair of drives in the RAID set so each write request from your application becomes 2 IOs to the back-end disks. As a result, the number of write IOPS a mirrored RAID set can deliver is half the sum of the IOPS the drives in the set can deliver. A smart RAID controller can distribute read requests across the drives in the mirrored pair, a process Novell dubbed disk duplexing, so that mirrored RAID set can deliver almost four times as many read IOPS as write IOPS..

Things get a little more complicated with parity-based RAID schemes like RAID 5 and RAID 6, as the amount of work the disks have to do varies depending on the size of the write request. If the write request is smaller than the RAID stripe size (as is common in database applications where the database engine writes data in 4 or 8KB pages to a RAID set that stripes data across its drives writing 64KB to each drive), a storage system running parity RAID will have to perform several IO operations to satisfy a single write request.

To write a small change to a parity RAID set, the controller must read the data currently in the RAID set to memory, insert the new data, calculate the new value for the parity stripe(s) and then write the new data and parity to the back-end disks. While the number of actual I/O operations depends on how many drives are in the RAID set, the process of reading or writing the data occurs in parallel across all the data drives. The net effect for an optimized RAID controller is that each small write causes roughly four times the IOPS and latency that a write to a single drive would. Of course, in the real world many RAID controllers don't have the bandwidth or CPU horsepower to achieve total parallelization, so a 14+1 RAID 5 set will do small I/Os significantly slower than a 5+1 RAID set. Similarly, many RAID controllers calculate the two sets of parity data for a RAID 6 set sequentially, so their RAID 6 performance is substantially less than their RAID 5 performance.

Parity RAID is better suited to environments like file servers and streaming media, where the write I/O sizes are larger than the stripe size. If you're writing more than 512KB at a time to an 8+1 RAID 5 or 8+2 RAID 6 set with a 64KB stripe size, the RAID controller doesn't need to read the existing data--it can just calculate parity and slam the new data and parity to the disk drives.

You, dear reader, should note that vendors have tricks up their sleeves that can make more sophisticated array controllers perform better than the simple RAID controller described here would. A nonvolatile memory cache that allows an array to acknowledge writes before writing data to its back-end storage would give that array to have lower write latency as long as the traffic was bursty enough to let the controller flush its cache before another burst of data arrived. Similarly, log-based data structures allow the controller to always write full RAID stripes to free space, reducing the write amplification of small writes to parity-based RAID set.

While an oversimplification--as all rules of thumb are--the storage admin's rule of thumb to use RAID 10 for random I/O workloads and parity RAID for sequential workloads does hold water. Of course, sequential workloads, other than backups, are becoming rare in today's virtualized data center.

I'd like to thank Duncan Epping for his post on this subject at the Yellow Bricks blog and all the folks who contributed to the long series of comments to that post. That discussion helped me focus my thoughts on the subject of RAID and storage performance.


Related Reading


Network Computing encourages readers to engage in spirited, healthy debate, including taking us to task. However, Network Computing moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. Network Computing further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | Please read our commenting policy.
 
Vendor Comparisons
Network Computing’s Vendor Comparisons provide extensive details on products and services, including downloadable feature matrices. Our categories include:

Research and Reports

Network Computing: April 2013



TechWeb Careers