Data Classification Tips And Technologies
March 29, 2012
As data continues to grow in terabytes in the enterprise, classifying it based on its sensitivity has never been more important. According to a just-released InformationWeek report, 10 Steps to Effective Data Classification, an organizational classification program defines policy requirements; specific classifications, generally as 'secret', 'private', 'confidential' and 'public', and their associated data types; processes and procedures; accountability metrics; and repercussions from not following the rules, according to author Erik Bataller, a senior consultant with information security consultancy Neohapsis.
In the first of a two-part series on 10 guiding principles and practical recommendations for a classification program that will help companies meet their regulatory requirements, Bataller says the starting point has to be getting buy-in from everyone from the CEO down to take classification seriously. “The best way to make that happen is to develop the classification program directly with key stakeholders,’’ he writes. “The business, not IT, owns organizational data, so establish a dialogue with the executives and staff responsible for relevant systems. They need to be the enforcers across their groups.”
The second step is to understand the drivers for classification. Regardless of whether it is subject to regulatory requirements, all companies have sensitive data and they need to take a look at how are they mitigating risk and liability. Third, organizations should keep classification programs simple, he emphasizes. If rules are overly difficult, they will likely be ignored, and they also cannot have a significant impact on productivity, he says.
The fourth step is to think through classification levels. The highest is of data that is critical to the core value of the organization; the second highest is data that needs to be kept confidential, followed by data that should not be distributed outside of the company, but if disseminated would not cause lasting harm. The last level of data is any that can be viewed publicly, Bataller says.
“Classification efforts in particular are a prime opportunity to develop an availability model that can go hand in hand,’’ he says. Customer support systems, for example, often contain highly sensitive data and must be readily available on a 24/7 basis. “Documenting these criteria while classifying can increase the value and subsequent support of the program and provide a more comprehensive understanding of how systems must be designed and resourced.”
Choosing technology and controls is the fifth step, according to Bataller, even though there isn’t any specific data classification technology. When thinking about what to use, IT should match the value, timeliness and performance requirements of the data, as well as the applications using the data, to the performance and cost characteristics of the media, notes Kurt Marko, a regular contributor to InformationWeek and an IT industry veteran.
“Auto-tiering software, which takes a relatively simplistic approach to the problem by looking at easily measured parameters like last access time [and] frequency of access, at best does a crude job at this,’’ Marko says, suggesting that it is better to “consciously classify different types of data, using not only high-level parameters like file/content-type, but more granular metadata/document tags.” The better the classification, the more accurately IT can bind different data types to the best storage media.
In terms of hardware, there is still use of tape for archive data, but users are migrating to disk for data protection, says Deni Connor, founding analyst, Storage Strategies NOW/Systems Strategies NOW. The cloud is also becoming a useful archive for data, she says, and in some SMBs, is replacing tape.
As for governance, risk management and compliance (GRC), Connor says that users need to have an on-site disk or tape archive of data that can be accessed quickly and efficiently. “GRC is often driven by the courts, so the response time is critical – either disk or LTO-5 tape will suffice,’’ she says. “With LTO-5 you have LTFS capability, which makes it easier and quicker to recover data from tape.
Learn more about Strategy: Hadoop and Big Data by subscribing to Network Computing Pro Reports (free, registration required).