Server-side caching solutions provide a quick fix to storage performance problems. Simply install an SSD, load up the software and see performance problems caused by overwhelmed storage networks and outdated legacy storage systems go away. But these solutions are not perfect, as we covered in my last column -- there are specific features to look for to make sure they don't create more problems than they solve. Another key area to understand is how these caching solutions handle write I/O.
What Is Write Caching?
Most caching solutions only serve recently read data from the cache memory. Read caching is popular because of its safety. All the data in cache is also on disk. Basically when new data is written or existing data is changed, data is written to the hard disk layer and acknowledged back to the application. It is then later promoted to cache when accessed enough times to make it "cache worthy." This means that in the event of a cache failure or in virtualized environments when a VM migration occurs, the data is still on a shared disk and the application should continue to work, albeit slower.
[ Here's why software-defined storage won't drive established storage companies out of business: Software-Defined Storage Vs. Traditional Storage Systems. ]
There is also a technique called write-through caching where data is written to cache and to disk at the same time but still has to be acknowledged from the hard disk tier. The value in this technique is that data does not have to be written to disk and then moved to cache memory as a separate step. It works on the very accurate theory that what was recently accessed is also most likely to be accessed next. This technique eliminates the extra step to determine cache worthiness.
A true write cache writes data only to the cache memory area, acknowledging to the application at that point that the write has been secured. Because flash memory is much faster than HDDs, even on writes, the application should see a significant performance increase. This is especially true in most database environments and virtual desktop environments. Interestingly, write caching can make HDDs more efficient because writes can now be coalesced and written in bigger blocks, which means less data being written to disk and it being written in such a way that it is more read efficient.
The big problem with write caching is that if there is a cache failure, there is no copy on disk. As a result, if there is a cache failure or an improperly handled VM migration, data will be lost. The migration issue has largely been handled by virtualization-aware caching solutions that are signaled of an impending migration and will flush cache prior to migrating the VM. The big risk is cache failure and server side caches are particularly exposed because all the cache data is siloed in the server.
The Write Value Of Read Caching
Read caching and write-through caching already make writes more efficient. First, most environments are more read heavy than they are write heavy. Seventy percent to 80% reads is not uncommon. Let's assume that the caching solution achieves a 95% read accuracy (which is actually a very poor result).
With server-side caching that means that 65% to 75% of the reads are no longer traversing the storage network nor do they require the storage system to be involved in handling them. This means that the storage network can now be up to 75% dedicated to the handling of write I/O.
Also, the performance delta between hard disk and flash is not as great on write I/O as it is on reads. Flash writes data much slower than it allows data to be read.
The good news is that read caching will improve overall write performance. The bad news is those writes are happening at hard drive speeds. Flash would be faster and further optimize the environment, especially those where the write I/O percentage is larger than what I list above. There is a role for write caching but it has to be done safely. We will discuss some of the techniques that we are seeing vendors implement in our next column.