Automated Tiering Needs Metadata

EMC's announcements of FASTcache and sub-LUN FAST, the feature formerly known as FAST 2.0, has got me thinking once again about how to get the best bang for the big bucks flash memory will cost you. The whole idea of automated tiering is supposed to move the hot data to flash while leaving the less frequently accessed cold data on spinning disks. My question, is how do you determine what data is hot?

Howard Marks

May 21, 2010

2 Min Read

Clearly automated tiering requires that the storage system collect some stats on access frequency. The simplest thing is to keep several days of IOP/day counts or a moving average IOP/hr for each data block. An admin could then create a policy that, for selected volumes, moved blocks with higher access counts up to the flash tier, and colder ones are moved down to the trash tier of high-capacity SAS drives.

Average temperatures are all well and good, but anyone who has been to camp in the mountains knows that the average temperature doesn't say enough about the weather to know how to dress. Even if the average temperature for a given day is 70 degrees it can be 70 degrees all day or 50 degrees at 7AM and 90 degrees at 4PM. Similarly, some workloads may be so bursty that they generate many IOPs/min occasionally, but not enough across an entire day to be in the hot 5-10 percent we can afford to put in flash. Moving the 90 degree blocks to flash may have a bigger impact on application performance and cost than moving the data that's the metaphorical equivalent of a day in Honolulu where it's 74-85 degrees all day every day. Then there are the periodic loads. Things like weekly data warehouse cube-builds, end-of-month processing, class registration and the like can be predicted. When the tiering process runs on Friday night, or on the last day of the month, I might want to direct it to use the access metadata from last Saturday when the data warehouse was being loaded or last month's end.

Finally, we have to consider block sizes. Keeping access metadata on each addressable block in a system would quickly exhaust the array CPU and require enough space for users to notice they're getting 5-15 percent less than they used to. Bigger blocks simplify tiering, but they also drag cooler data along with the hot data, reducing its effectiveness. Vendors haven't talked about block size in tiering much yet, but I imagine most are using 64K-4MB blocks that align with RAID stripes.

Caching is sounding simpler all the time, isn't it?

About the Author(s)

Howard Marks

Network Computing Blogger

Howard Marks is founder and chief scientist at Deepstorage LLC, a storage consultancy and independent test lab based in Santa Fe, N.M. and concentrating on storage and data center networking. In more than 25 years of consulting, Marks has designed and implemented storage systems, networks, management systems and Internet strategies at organizations including American Express, J.P. Morgan, Borden Foods, U.S. Tobacco, BBDO Worldwide, Foxwoods Resort Casino and the State University of New York at Purchase. The testing at DeepStorage Labs is informed by that real world experience.He has been a frequent contributor to Network Computing and InformationWeek since 1999 and a speaker at industry conferences including Comnet, PC Expo, Interop and Microsoft's TechEd since 1990. He is the author of Networking Windows and co-author of Windows NT Unleashed (Sams).He is co-host, with Ray Lucchesi of the monthly Greybeards on Storage podcast where the voices of experience discuss the latest issues in the storage world with industry leaders.  You can find the podcast at: http://www.deepstorage.net/NEW/GBoS

Related Topics

Recent in Infrastructure

Related Topics

Recent in Network Mgmt

Related Topics

Recent in Security

Related Topics

Recent in Enterprise Connectivity

Related Topics

Recent in Wireless

Related Topics

Recent in Careers

Related Topics

Automated Tiering Needs Metadata

About the Author(s)