Automated Tiering Needs Metadata

EMC's announcements of FASTcache and sub-LUN FAST, the feature formerly known as FAST 2.0, has got me thinking once again about how to get the best bang for the big bucks flash memory will cost you. The whole idea of automated tiering is supposed to move the hot data to flash while leaving the less frequently accessed cold data on spinning disks. My question, is how do you determine what data is hot?

Clearly automated tiering requires that the storage system collect some stats on access frequency. The simplest thing is to keep several days of IOP/day counts or a moving average IOP/hr for each data block. An admin could then create a policy that, for selected volumes, moved blocks with higher access counts up to the flash tier, and colder ones are moved down to the trash tier of high-capacity SAS drives.

Average temperatures are all well and good, but anyone who has been to camp in the mountains knows that the average temperature doesn't say enough about the weather to know how to dress. Even if the average temperature for a given day is 70 degrees it can be 70 degrees all day or 50 degrees at 7AM and 90 degrees at 4PM. Similarly, some workloads may be so bursty that they generate many IOPs/min occasionally, but not enough across an entire day to be in the hot 5-10 percent we can afford to put in flash. Moving the 90 degree blocks to flash may have a bigger impact on application performance and cost than moving the data that's the metaphorical equivalent of a day in Honolulu where it's 74-85 degrees all day every day. Then there are the periodic loads. Things like weekly data warehouse cube-builds, end-of-month processing, class registration and the like can be predicted. When the tiering process runs on Friday night, or on the last day of the month, I might want to direct it to use the access metadata from last Saturday when the data warehouse was being loaded or last month's end.

Finally, we have to consider block sizes. Keeping access metadata on each addressable block in a system would quickly exhaust the array CPU and require enough space for users to notice they're getting 5-15 percent less than they used to. Bigger blocks simplify tiering, but they also drag cooler data along with the hot data, reducing its effectiveness. Vendors haven't talked about block size in tiering much yet, but I imagine most are using 64K-4MB blocks that align with RAID stripes.

Caching is sounding simpler all the time, isn't it?

Juniper Networks Announces AI-Native Networking Platform

Zeus Kerravala, Founder and Principal Analyst with ZK Research

January 31, 2024

Bob Friday, Chief AI Officer for Juniper Networks, explains how the advanced technology is transforming operations.

Understanding Why Contact Center Agent Empowerment is Critical to a Great Customer Experience

Zeus Kerravala, Founder and Principal Analyst with ZK Research

January 29, 2024

Contact center leaders from 8x8, Awaken Intelligence, and 360insight discuss the importance of agent experience.

AI Drives the Ethernet and InfiniBand Switch Market

David Curry, Technology Writer

January 27, 2024

AI may force enterprises to rewire parts of their data centers so they are fully optimized to run such workloads. The question is do you use Ethernet or InfiniBand?

Automated Tiering Needs Metadata

Tags:

Recommended For You

Juniper Networks Announces AI-Native Networking Platform

Understanding Why Contact Center Agent Empowerment is Critical to a Great Customer Experience

AI Drives the Ethernet and InfiniBand Switch Market

Search form

Automated Tiering Needs Metadata

Tags:

Recommended For You

Juniper Networks Announces AI-Native Networking Platform

Understanding Why Contact Center Agent Empowerment is Critical to a Great Customer Experience

AI Drives the Ethernet and InfiniBand Switch Market