ILM Gets a Life
Information Lifecycle Management is making a comeback, but don't look for any products just yet. We help plan your ILM strategy so you get the most out of your storage
September 9, 2005
And at a storage technology conference last month in Dubai in the United Arab Emirates, EMC presented its ILM solution, which includes its recently acquired VMWare server virtualization product (see "Going Virtual" for a review of virtual machine technology). It was clear from EMC's back-pedaling during the Q&A session that VMWare has nothing to do with ILM. A Hitachi Data Systems official at the same conference, said his company's products feature ILM: "Whenever you write data to a disk, you are doing ILM," he explained.
Bottom line: None of the more than 270 hardware and software products called ILM solutions really are. The range of offerings include HSM (hierarchical storage management) software, document- and content-management systems, global name space management and virtualization tools, e-mail and database archiving products, backup/restore software, enterprise storage arrays, high-capacity SATA arrays and "sticky" content addressable storage arrays (a new category of appliances intended for long-term storage).
History Lesson
The term ILM was coined by StorageTek and made possible by IBM with mainframe-based Systems Managed Storage (SMS) in the late 1970s. ILM is a strategy for optimizing the management, allocation and utilization of storage resources in mainframe environments, where software tools classify data and storage resources and record access made to specific data sets. This information is used by a policy-based data mover to migrate data intelligently among storage devices over time.SMS became the centerpiece of an ILM strategy that improved storage management and increased the efficiency of storage allotments to 80 percent. But when computing moved from mainframes to distributed computing in the 1980s and '90s, SMS didn't go along. Without an operating environment controlled by a single vendor, it was difficult to implement ILM in the distributed world.
But without an SMS-type solution, enterprises have poorly allocated and used their storage space. We store the wrong data on the wrong platforms from a data-value versus storage-cost perspective. Large IT shops are drowning in data as storage costs accelerate to 45 percent to 75 percent of annual IT hardware budgets. And the efficiency of capacity allocation--how well you're doling out the storage space you have--in distributed storage hovers at an anemic rate of 25 percent to 35 percent, according to studies published by Fred Moore, CEO of Horison Information Strategies.
Any way you slice it, ILM is a do-it-yourself proposition. Before building an ILM capability, consider your business-process requirements, infrastructure and budget.
The simplest ILM strategy is to segregate your data into containers by department--an approach has its trade-offs. For example, all data isn't equally important. Nor does it retain its usefulness, importance or frequency of reuse over time. Plus, the contents of each container will grow at roughly the same rate as your overall data growth, so lumping data into a directory or zone may merely create more containers to manage. So this strategy won't improve the efficiency of your long-term data management. Processes such as backup will benefit little from the approach and may even become more unwieldy over time as your backup targets multiply.
All Kinds Of Data |
Another way to separate your data is by type. Companies have four types--structured (databases), semistructured (e-mail or groupware where a database-like structure contains a file payload), workflow (forms and other standardized content that is the stuff of content-management systems) and unstructured (user files created by productivity apps). See "All Kinds of Data," at left.
Separating data into types has its advantages. You can use different management tools for different types of data. Software like OuterBay Technologies' Application Data Management Suite or Princeton Softech's Archive, for example, can help you manage the contents of databases. These tools let you extract older data from databases and archive or delete it per policy. Similarly, e-mail archiving products like Mimosa Systems' NearPoint can separate file attachments from their e-mail containers so they can be subjected to data discipline.
Tools for managing data collected in content management or a workflow process also are available. Document-management system vendors are recasting themselves as enterprise content-management (ECM) providers with products aimed at managing workflow data. Among this list are EMC (which recently acquired Documentum), FileNet and about 15 others. Most ECM vendors are trying to move into database, e-mail and/or user file management as well.
No specific management tools for files exist, so they need all the help they can get. Unstructured data represents between 50 percent and 65 percent of all data generated in an organization, according to surveys by the Data Management Institute. And ILM often comes down to wrangling all those recalcitrant user files. Users generally do a poor job of assigning their files descriptive titles, and without descriptive information, it's tough to create a coherent management strategy for your files.
Even with such tools, managing data by type produces mixed results. You need multiple point-management software tools, each of which has its own policy engine and policy-programming syntax and may require its own management servers and storage platforms. This means more people to administer the tools.For user files, plan on some manual intervention to make sense of poorly identified or improperly stored files. Tools from Arkivio, NuView, PSS Systems, Xenware and others show promise in bringing some order to files, but anything that requires user buy-in is going to produce spotty results.
The holy grail of ILM is data management by some sort of classification scheme. Once a scheme is established, data must be tagged with self-referencing bits (or cross-referenced to an externalized directory) to ensure it's migrated across the storage infrastructure according to class. You can construct policies to employ data class and usage characteristics, and to determine what to move and when. For example, some data may require retention on low-cost disk for many years before it's finally migrated to tape or deleted, while other data may be deleted after 90 days. Mainframe ILM got close to this level of granularity, but there's nothing like it in the distributed world.
No schema exists thus far for data classification. The Storage Networking Industry Association (SNIA) is the latest organization to attempt a universal data-naming schema, but to no avail. SNIA proposed a "bottom-up" approach, with data classification based on the storage targets. That's analogous to automobile manufacturers deciding which drivers with which demographic characteristics will be placed into a BMW versus a Chrysler. SNIA's efforts are ongoing.
A top-down strategy, being advanced by organizations ranging from the Data Management Institute to the Compliance, Governance and Oversight Council, makes more sense. The enterprise develops its data-naming scheme after considering its business processes and priorities.
These processes give data its defining characteristics (such as importance, useful life, criticality and access requirements), much like DNA handed down by parents to their offspring. Your data will remain a mass of undifferentiated bits if you don't understand the DNA of your business processes.So you must identify, list and define data objects and classification criteria, and then compare them for similarities. When like groups arise, you've got your data classes. Then you must find a way to apply classes consistently--by involving users or by harnessing yet-to-be-developed technologies, so that data movers can pick data and move it by class policy.
Whether you want to manage your data by originating department and storage repository, data type or class, getting there will be challenging. Although data management is an essential task for IT, it's gotten short shrift as enterprises have migrated applications from mainframes to distributed systems. To remedy any inefficiency in your storage capacity and ensure regulatory compliance, you must start an ILM initiative today.
How To Plan and Build ILM
Ignore the vendor hype. ILM is not a product. There is no silver bullet and no ILM 1.0 product available, so stop shopping for one.
Decide what ILM means for your business. Create a vision of a managed data environment and what that means for your compliance objectives as well as for other storage-management goals. Talk with business-process owners at your organization and with your business-continuity planner, records managers and legal department for additional guidance.
Dissect your data flows. Understand the applications on which your business processes rely. Analyze these workflows to pinpoint the data they produce and use. This helps you manage your data assets. (Following up with an inspection or audit will confirm the current layout of the data across your storage infrastructure.)
Compare your findings with an asset-management system (or discuss the findings with your company's financial and IT management pros). This step is optional, but you'll discern how inefficient and expensive your storage utilization actually is.
Decide how granular you want your data management. The simplest ILM approach is to segregate and store your critical data into containers--folders, directories, zones or volumes in your storage infrastructure. You can use global name-space controls or fabric-zoning controls, or you can designate specific volume targets for a business unit or department's output.
Set data-management policies. Under current laws and regs, you need to retain data for a set period of time and coordinate its deletion with policies. Apply the appropriate access controls and protective measures per policy.
Jon William Toigo is CEO of storage consultancy Toigo Partners International, founder and chairman of the Data Management Institute and author of 13 books, including Disaster Recovery Planning: Preparing for the Unthinkable (Pearson Education, 2002) and The Holy Grail of Network Storage Management (Prentice Hall PTR, 2003). Write to him at [email protected].0
You May Also Like