Storage

09:00 AM
Howard Marks
Howard Marks
Commentary
Connect Directly
Twitter
RSS
E-Mail
50%
50%

Automated Tiering Needs Metadata

EMC's announcements of FASTcache and sub-LUN FAST, the feature formerly known as FAST 2.0, has got me thinking once again about how to get the best bang for the big bucks flash memory will cost you. The whole idea of automated tiering is supposed to move the hot data to flash while leaving the less frequently accessed cold data on spinning disks. My question, is how do you determine what data is hot?

EMC's announcements of FASTcache and sub-LUN FAST, the feature formerly known as FAST 2.0, has got me thinking once again about how to get the best bang for the big bucks flash memory will cost you. The whole idea of automated tiering is supposed to move the hot data to flash while leaving the less frequently accessed cold data on spinning disks. My  question, is how do you determine what data is hot?

Clearly automated tiering requires that the storage system collect some stats on access frequency. The simplest thing is to keep several days of IOP/day counts or a moving average IOP/hr for each data block. An admin could then create a policy that, for selected volumes, moved blocks with higher access counts up to the flash tier, and colder ones are moved down to the trash tier of high-capacity SAS drives.

Average temperatures are all well and good, but anyone who has been to camp in the mountains knows that the average temperature doesn't say enough about the weather to know how to dress. Even if the average temperature for a given day is 70 degrees it can be 70 degrees all day or 50 degrees at 7AM and 90 degrees at 4PM. Similarly, some workloads may be so bursty that they generate many IOPs/min occasionally, but not enough across an entire day to be in the hot 5-10 percent we can afford to put in flash. Moving the 90 degree blocks to flash may have a bigger impact on application performance and cost than moving the data that's the metaphorical equivalent of a day in Honolulu where it's 74-85 degrees all day every day. Then there are the periodic loads. Things like weekly data warehouse cube-builds, end-of-month processing, class registration and the like can be predicted. When the tiering process runs on Friday night, or on the last day of the month, I might want to direct it to use the access metadata from last Saturday when the data warehouse was being loaded or last month's end.

Finally, we have to consider block sizes.  Keeping access metadata on each addressable block in a system would quickly exhaust the array CPU and require enough space for users to notice they're getting 5-15 percent less than they used to.  Bigger blocks simplify tiering, but they also drag cooler data along with the hot data, reducing its effectiveness. Vendors haven't talked about block size in tiering much yet, but I imagine most are using 64K-4MB blocks that align with RAID stripes.

Caching is sounding simpler all the time, isn't it?

Howard Marks is founder and chief scientist at Deepstorage LLC, a storage consultancy and independent test lab based in Santa Fe, N.M. and concentrating on storage and data center networking. In more than 25 years of consulting, Marks has designed and implemented storage ... View Full Bio
Comment  | 
Print  | 
More Insights
Slideshows
Cartoon
Audio Interviews
Archived Audio Interviews
Jeremy Schulman, founder of Schprockits, a network automation startup operating in stealth mode, joins us to explore whether networking professionals all need to learn programming in order to remain employed.
White Papers
Register for Network Computing Newsletters
Current Issue
Video
Twitter Feed