For the better part of the past decade, the primary focus of IT has been on maximizing uptime. Building an infrastructure that is resilient enough to tolerate hardware failures, software faults, and/or scheduled maintenance was once a monumental challenge. But thanks to virtualization and improvements in application resiliency, building a robust infrastructure is now relatively easy to do. As a result, the focus of IT has recently shifted instead to optimization. How do you make resilient infrastructures fast, operationally efficient, and cost effective?
A data center infrastructure is just a collection of inanimate objects. It's the applications and the workloads that give it life, and validate how well your environment is capable of handling those workloads. Unfortunately, they behave in ways you never imagined, let alone designed for. Every data center has a unique mix of workloads that impact the infrastructure, regardless of what type of architecture is being used. Spinning up new workloads is almost effortless in a modern virtualized infrastructure. But this has a downside, as it is tough to predict how new workloads impact the data center, including other workloads.
To properly design your data center, you need to understand various workload characteristics, how they change over time, and how they impact application performance. This blog is the first in a series that will help address this challenge, covering the top six things you should know about your virtualized workloads. By understanding these principles, you can cost effectively optimize application performance while maximizing the efficiency of services in your virtualized data center.
#1: Reads and writes behave differently
Most people realize that reads and writes behave differently in a virtualized data center, particularly with respect to the storage infrastructure. However, some of the finer details of this behavior have been overlooked. Furthermore, many companies lack the proper tools to measure these differences, which ultimately devalues its impact. Just the opposite is true, however. As storage moves to flash, read/write behavior is more important than ever.
Writes are more of a burden on shared storage systems than reads. This is due to the additional activity required by a storage system to protect the data. A "penalty" is charged for data protection in the form of extra activity. A RAID 1 stripe, for example, causes two I/Os for every write issued by the VM, while RAID 5 causes four I/Os, and RAID 6 causes six I/Os. This has a huge impact on storage performance. But it is not the only difference.
Guest operating systems often leverage file system buffer caching inside VMs, which creates large block sizes when issuing writes. Larger block sizes mean that write I/Os carry more data than read I/Os. This has a profound impact throughout the entire infrastructure, especially on flash storage. While reads are relatively easy for flash to handle, writes with larger I/O sizes almost always see substantial performance degradation -- even worse than in disk-based storage. This can be a big surprise for people migrating to this new high-speed medium for their storage needs.
How can you get the visibility you need into read/write behavior? Unfortunately, existing tools can be lacking on this front. Here's why:
- They only provide read/write ratios that are calculated by measuring the number of read commands versus the number of write commands. However, as noted above, write I/Os are often much larger than read I/Os. So a read/write ratio of 50/50 may not be not accurate if the ratio of actual data transmitted is more like 30/70 or 20/80. Looking at the actual amount of data paints a very different picture when observing read/write behavior!
- While the ratio is a good number to know. It represents a summary over a period of time, in a percentage form. What one needs to understand is the distribution of reads versus writes over the course of time, and the absolute numbers (read operations and write operations) that are associated with the workload. This is critical to understanding performance problems.
An informed administrator needs a way to not only see read/write ratios, but the distributions of those reads and writes as they occur across individual VMs and the entire data center. This includes visibility into read/write data beyond just I/O commands, the actual payload being transmitted, and the latencies associated with that specific activity.
#2: Access patterns affect performance
Storage access patterns of your workloads impact not only the behavior of the VM and the services it is rendering, but all other workloads running in that infrastructure. It is this "noisy neighbor" effect that can make VMs in an otherwise well performing environment suffer.
We've already described how reads and writes can have different impacts on the storage infrastructure, but the type of access pattern they produce is also important. Sequential access and random access patterns can impact the burden I/Os have on the storage infrastructure. This was a characteristic that has typically been associated only with the mechanical latencies of spinning disks. However, its impact can be seen on environments using flash as well. A single sequential workload in an environment can dramatically increase congestion, quickly filling up, and flushing caches used by other workloads. Sequential workloads have a tendency to use larger block sizes, and induce erratic performance of the storage infrastructure.
Lack of understanding and accommodating for access patterns in an environment can mean a poorly performing or wildly inconsistent environment. This is true regardless of the type of storage architecture used. Whether it is a traditional SAN infrastructure, or a distributed storage solution used in a hyperconverged environment, the challenges remain.
Stay tuned for the next post in this series, when we'll describe characteristics of your VMs that you might be overlooking.