Storage

02:40 PM
George Crump
George Crump
Commentary
50%
50%

SSD Options: Tier Vs. Cache

Solid state storage can be used as a cache or an automated tier. Both approaches will make sure that the most active data is on the fastest tier, but you need to know the differences.

I have written a lot recently about the different ways you can implement solid state disk (SSD) in your data center. One of the most popular methods is to use automation, either through automated tiering techniques or through caching. While both technologies make sure that the most active data is on the fastest tier, there are differences in the technologies. What will work best will depend on your data center.

Automated tiering and caching often get confused. While each vendor's technology will vary a bit automated tiering is generally seen to be a more permanent placement of data on a faster tier of storage. It also can be seen as a way to move less active data to a high capacity but more cost effective tier of storage. Caching is often seen as more temporary in nature, accelerating only the most active data and, in most cases, this approach does not move old data to a third tier of storage.

The challenge in trying to grasp these two methods is that when used with solid state their use looks similar. In the past, caching was often thought of as a very small area of memory used to accelerate disk access for a very short period of time. Often it held only the most recent minutes of accessed data. Obviously the chances of a cache miss were relatively high, which meant a performance degradation as data was retrieved from mechanical hard disk. This lead to a very narrow deployment model, either a single server or a specific application on that server.

With the falling cost of today's flash-based SSDs, a very large cache can be created and data can reside on cache for a long period of time. This of course reduces the chance of a cache miss. It also means that data can be in cache for hours, even days if the flash memory in the cache is sized large enough. Flash has allowed large caches to be deployed in a much broader fashion and across multiple servers and applications.

A big difference between cache and automated tiering is that the data in cache is always a second copy of the data that is on the hard drive. Automated tiering is an actual move of data from the hard drive. Failure of the cache rarely produces a data loss, just a performance loss since everything would need to be served from mechanical drives until the cache can be replaced.

Since the SSD tier holds potentially the only copy of data in an automated tiering system, the failure of the SSD tier can't be tolerated so these systems have to set the SSD tier in a redundant configuration by using a RAID-like data protection scheme. The overhead of that protection, RAID parity bit calculation for example, may impact performance and of course any RAID algorithm requires extra disk capacity. Having to purchase extra SSD to support a RAID-like function makes an already premium priced technology even more expensive.

In most situations, read performance should be about the same between the two options. Mostly the efficiency of read performance is going to depend on the efficiency and customizability of the caching appliance to promote data. The goal should be to make sure the right data is in cache at the right moment in time. As we discuss in our recent article "Maximizing SSD Investment With Analytics" we believe that this is the largest opportunity for improvement in this technology. Both caching and automated tiering need to become smarter about what they cache and when.

Another area to examine with automated tiering vs. caching is which one can deliver better write performance and can be clear are of distinction between automated tiering and caching. We'll cover this in our next entry.

Follow Storage Switzerland on Twitter

George Crump is lead analyst of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. Storage Switzerland's disclosure statement.

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Pat Kilgore
50%
50%
Pat Kilgore,
User Rank: Apprentice
10/26/2011 | 3:05:44 PM
re: SSD Options: Tier Vs. Cache
Interesting article! Tiering with SSDs can be difficult, however, as identifying your active data set is an arduous task. Cache IQ's RapidCache is the perfect answer to this problem. RapidCache is a network-based appliance that uses simple policies to cache active data sets using DRAM and SSD. It also provides rich analytic insight and installs seamlessly into any environment.

To learn more visit www.cacheiq.com
Hot Topics
12
3 Signs You're Overspending On Data Storage
John Morris, President and CEO, Cleversafe,  7/24/2014
White Papers
Register for Network Computing Newsletters
Cartoon
Current Issue
Video
Slideshows
Twitter Feed