De-Classifying Data Classification
Data classification means different things to different vendors
June 30, 2006
Data classification: It's been viewed as the missing piece of ILM that could make a lot of things fall into place. Unfortunately, users are discovering that the term means different things to different vendors.
To start at the top: In the past year, a handful of startups have launched software that classifies unstructured data so it can be managed across storage tiers. (See Peekaboo, StoredIQ!, Scentric Gets Classified, and Njini Granted $13M.) The newcomers include Abrevity, Arkivio, Kazeon, Index Engines, Njini, Scentric, and StoredIQ. All have aimed to fill the void for systems vendors who neglected data-classification capabilities in the first place. Network Appliance's OEM deal with newcomer Kazeon last November brought predictions that a slew of similar deals would follow. (See Kazeon Pairs With NetApp.)
Now systems vendors are looking to provide their own data classification. EMC last week unveiled its Intelligent Information Management (IIM) strategy for classifying data, and Compellent this week launched a SAN system aimed at classification. (See EMC Intros IIM and Compellent Automates Classification.)
But with more vendors getting involved in classification, it's vital that users note differences in the offerings.
It's too early to tell about EMC. Arun Taneja of the Taneja Group says it's an "initiative, much like ILM is an initiative. They're saying, 'We've already taken steps into ILM. Now we're pushing the wall further and classification is a big part of it.' "It is clear the major part of EMC's IIM won't be coming from any acquisitions, since the vendor already has beta customers for the technology. But it's still not clear whether EMC will use its software strictly to classify data on its own systems, or whether it will support third-party systems as well. Also, it's unclear whether EMC will put data classification into its software applications, such as DiskXtender, EmailXtender, DataBaseXtender, and Visual SRM.
Compellent's classification moves data across storage tiers on the basis of a file's metadata, which includes information the system gathers on a file's age, size, owner, and last accessed date.
That's useful, but it only scratches the surface, Taneja notes. Heavy-duty classification products are needed for corporations looking to organize data for compliance purposes or for rapid-restore strategies.
"The classification in Compellent is for midsize companies who want to manage better than they do today, but are not looking for classification for compliance or e-discovery," Taneja says. "Compellent doesn't go to the far extreme. Its classification is based only on metadata, not on content of data. If you want to take all the email or all the files that have the term 'Morgan Stanley' inside the document and move it to another tier, it won't do it."
More sophisticated tools examine a file's content to manage it more intelligently. For instance, companies may want to keep confidential or other important files on primary disk even if they haven't been accessed in months. Abrevity, Kazeon, Index Engines, Njini, Scentric, and StoredIQ all claim to index files' content as well as metadata.The Compellent Storage Center Quick Start ILM combines the vendor's Data Progression and continuous snapshot software with high performance Fibre Channel and lower-performing Fibre Channel or SATA drives. Pricing begins at around $50,000 for 6.4 Tbytes.
Jeff Berliner, IT director for the NYU School of Medicine , has been using Data Progression for about three months. His goal is to save money by moving data that is not frequently accessed off expensive disk onto cheaper SATA drives. He set his system to automatically move files that haven't been accessed for 12 days to lower-cost disk.
"We're not basing our strategy on content of files at all," Berliner says. "Not all our confidential documents need to be on the top tier of storage because they're not accessed all the time. Compellent's classification makes sure files taking up the most expensive storage are being accessed repeatedly."
NYU Medical Center's primary storage resides on Fibre Channel drives with Raid 10. Secondary data goes to SATA drives with RAID 5. Berliner says Compellent's classification features could save him around $60,000 in disk over the life of the system.
Compellent's classification system only works with Compellent SANs.Even with system vendors getting into data classification, partnership opportunities remain for startups. Arkivio, Kazeon, and StoredIQ are certified to work with EMC's Centera CAS archiving system. HDS has included Arkivio, Kazeon, Scentric, and StoredIQ in its ISV program for its archiving platform. (See Hitachi Intros Archive.) Kazeon and StoredIQ are Google OneBox partners. (See Kazeon, Google Search.) Backup software vendor CA has an OEM deal with Arkivio. (See CA Resells Arkivio.)
So far, not even OEM deals and other partnerships have pushed data classification much beyond the talking stage. Even Kazeon, which was early to market and aligned with NetApp from the start, has only about 30 customers.
Dave Raffo, Senior Editor, Byte and Switch
Organizations mentioned in this article:
Abrevity Inc.
Arkivio Inc.
CA Inc. (NYSE: CA)
Compellent Technologies Inc.
EMC Corp. (NYSE: EMC)
Google (Nasdaq: GOOG)
Index Engines Inc.
Kazeon Inc.
Network Appliance Inc. (Nasdaq: NTAP)
Njini Inc.
Scentric Inc.
StoredIQ Corp.
Taneja Group
You May Also Like