I have written a lot recently about the different ways you can implement solid state disk (SSD) in your data center. One of the most popular methods is to use automation, either through automated tiering techniques or through caching. While both technologies make sure that the most active data is on the fastest tier, there are differences in the technologies. What will work best will depend on your data center.
Automated tiering and caching often get confused. While each vendor's technology will vary a bit automated tiering is generally seen to be a more permanent placement of data on a faster tier of storage. It also can be seen as a way to move less active data to a high capacity but more cost effective tier of storage. Caching is often seen as more temporary in nature, accelerating only the most active data and, in most cases, this approach does not move old data to a third tier of storage.
The challenge in trying to grasp these two methods is that when used with solid state their use looks similar. In the past, caching was often thought of as a very small area of memory used to accelerate disk access for a very short period of time. Often it held only the most recent minutes of accessed data. Obviously the chances of a cache miss were relatively high, which meant a performance degradation as data was retrieved from mechanical hard disk. This lead to a very narrow deployment model, either a single server or a specific application on that server.
With the falling cost of today's flash-based SSDs, a very large cache can be created and data can reside on cache for a long period of time. This of course reduces the chance of a cache miss. It also means that data can be in cache for hours, even days if the flash memory in the cache is sized large enough. Flash has allowed large caches to be deployed in a much broader fashion and across multiple servers and applications.
A big difference between cache and automated tiering is that the data in cache is always a second copy of the data that is on the hard drive. Automated tiering is an actual move of data from the hard drive. Failure of the cache rarely produces a data loss, just a performance loss since everything would need to be served from mechanical drives until the cache can be replaced.
Since the SSD tier holds potentially the only copy of data in an automated tiering system, the failure of the SSD tier can't be tolerated so these systems have to set the SSD tier in a redundant configuration by using a RAID-like data protection scheme. The overhead of that protection, RAID parity bit calculation for example, may impact performance and of course any RAID algorithm requires extra disk capacity. Having to purchase extra SSD to support a RAID-like function makes an already premium priced technology even more expensive.
In most situations, read performance should be about the same between the two options. Mostly the efficiency of read performance is going to depend on the efficiency and customizability of the caching appliance to promote data. The goal should be to make sure the right data is in cache at the right moment in time. As we discuss in our recent article "Maximizing SSD Investment With Analytics" we believe that this is the largest opportunity for improvement in this technology. Both caching and automated tiering need to become smarter about what they cache and when.
Another area to examine with automated tiering vs. caching is which one can deliver better write performance and can be clear are of distinction between automated tiering and caching. We'll cover this in our next entry.
Follow Storage Switzerland on Twitter