Automatic Tiering - It Isn't HSM/ILM 2.0

Ever since NetApp's Tom Georgens said "I think the entire concept of tiering is dying" in an analyst call last month, the blogosphere has been all a-twitter about automated storage tiering, I mean George Crump alone got three blog entries out of it. Unfortunately, many of those writing about automated tiering are thinking about storage as strictly unstructured file data, arguing that better file management with an ILM-like solution would be a better idea. In array tiering, is the cost/performanc

Howard Marks

March 19, 2010

3 Min Read
Network Computing logo

Ever since NetApp's Tom Georgens said "I think the entire concept of tiering is dying" in an analyst call last month, the blogosphere has been all a-twitter about automated storage tiering, I mean George Crump alone got three blog entries out of it. Unfortunately, many of those writing about automated tiering are thinking about storage as strictly unstructured file data, arguing that better file management with an ILM-like solution would be a better idea. In array tiering, is the cost/performance answer for the problems ILM can't solve?

I will grant my fellow bloggers (and bloggerettes) that most organizations don't manage their unstructured data well. The average NAS in corporate America is loaded down with home directories of users that were fired in the last century, backups of user's iPods and multiple copies of the menus from every takeout joint in a 10 block radius.  A good data classification and archiving system could migrate that data to an archive tier and off the primary storage.  The resulting savings would come from having mutiple copies of the archive as opposed to the 5-10 copies we keep of primary data. The OPEX savings from fewer snapshots and less frequent full backups will be bigger than the $/GB savings from moving the data from 15K to 5400 RPM drives or even from a FAS6080 to $3-5/GB storage on a low-end NAS.

In theory, ILM is a great idea, and we should as an industry have made more progress towards systems that can classify and migrate data transparently, or at least translucently, to the users. The limited information found in typical file system metadata, and the fact that much of that data is completely mismanaged by IT pros, with users deciding what to save and where to save it, has made classification difficult keeping ILM out of the mainstream.

Even if our organization embraced ILM the way I would embrace Jennifer Connelly, we would still have disk I/O hotspots. The places where SSDs would really help performance, or by replacing short stroked FC drives reduce costs, aren't user folders and the like but generally databases and similar structured data

Take for example an application like Exchange. An Exchange data store is an atomic object. A system administrator can choose to put the data store on FC or SATA drives but the whole database has to be in one logical volume in the Windows volume manager. In a typical Exchange database 80 percent or more of the data is relatively static while a small percentage is several times busier.  Given that Microsoft has re-engineered Exchange to generate fewer IOPS in 2007 and again in Exchange 2010, we could probably run Exchange across wide-striped SATA drives. The Oracle or SQL server that runs our ERP system is another story. That database contains the details of every order filled in the last two years, and we were lucky to talk them into letting us archive the five year old data. It also has indexes and tables that generate tens of thousands of IOPS.  A good DBA could, at least with Oracle, separate the table spaces and run scripts to move old data to slower disks periodically leaving the hotter 40 percent of the data base on short stroked drives. Of course, he or she would have to run the scripts and re-assess periodically.

With automated tiering, the disk array could identify the hottest 10 percent of the database for SSDs, put the next 20 percent on 300GB FC disks (off of 40GB net 73GB short stroked drives) and the 70 percent of older data on SATA. It can re-balance daily, without needing detailed knowledge of the database schema and application.

Now we need to hire fewer really smart DBAs and put the ones we keep to work on database, as opposed to storage tasks. We can also speed the applications like Exchange, the in-house apps written by guys that left the company in 2003 and applications who's vendors don't provide detailed data dictionaries, where our DBAs don't have the information about where the hot data is.

About the Author(s)

Howard Marks

Network Computing Blogger

Howard Marks</strong>&nbsp;is founder and chief scientist at Deepstorage LLC, a storage consultancy and independent test lab based in Santa Fe, N.M. and concentrating on storage and data center networking. In more than 25 years of consulting, Marks has designed and implemented storage systems, networks, management systems and Internet strategies at organizations including American Express, J.P. Morgan, Borden Foods, U.S. Tobacco, BBDO Worldwide, Foxwoods Resort Casino and the State University of New York at Purchase. The testing at DeepStorage Labs is informed by that real world experience.</p><p>He has been a frequent contributor to <em>Network Computing</em>&nbsp;and&nbsp;<em>InformationWeek</em>&nbsp;since 1999 and a speaker at industry conferences including Comnet, PC Expo, Interop and Microsoft's TechEd since 1990. He is the author of&nbsp;<em>Networking Windows</em>&nbsp;and co-author of&nbsp;<em>Windows NT Unleashed</em>&nbsp;(Sams).</p><p>He is co-host, with Ray Lucchesi of the monthly Greybeards on Storage podcast where the voices of experience discuss the latest issues in the storage world with industry leaders.&nbsp; You can find the podcast at: http://www.deepstorage.net/NEW/GBoS

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox
More Insights