First, let's take a closer look at dark data. In its IT Glossary, Gartner defines dark data “as the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing).” Gartner also notes, “Dark data often comprise most organizations’ universe of information assets ... Storing and securing data typically incurs more expense (and sometimes greater risk) than value.”
Let’s also consider the magnitude of the impact that no-value data has in the enterprise. In 2012, a survey presented at the Compliance, Governance and Oversight Counsel Summit revealed that 1% of data in an enterprise has to be preserved for litigation hold, 5% has to be managed to cover compliance requirements, and another 25% is reasonably determined to have current business value. That means 69% of all data has no value whatsoever.
Now, there is not necessarily a 1:1 mapping between dark data and no-value data, but practically speaking, all data of no value can be subsumed as part of dark data (where the rest of dark data may still be able to serve some business purpose). As part of the data-driven enlightenment process that wants to see if some neglected or currently-used data really does have some value (such as big data and driving analytics insights), we don’t want to arbitrarily confine all of dark data to IT storage purgatory.
Still, the clock is ticking and long-term retention of dark data that has no value continues to incur cost (if not extra risk, as well). If storage is 40% of the IT hardware budget (a statistic I saw in a recent vendor presentation) and 69% of data has no value, then more than a quarter of the IT hardware budget each year serves no purpose.
Now, of course, it is unrealistic to think that all of the no-value data problems can be solved, but even if a third or a half of the potential savings could be achieved, that would be valuable.
So how does one go about defensible disposal of such data, elimination of redundant copies and more cost-effective tiering to alleviate the negative impact of excessive useless data? Trying to do everything manually is too labor intensive even if you could find, access, and view all the data that needs to be managed. Here is where CommVault's Simpana Reference Copy promises to help.
Simpana Reference Copy
CommVault’s Simpana software is a single platform designed to protect, manage, and access enterprise data. Many view it as a backup and recovery software product, but it can do more, such as archiving. Simpana really is a “broad spectrum” information management platform that can apply to a wide range of applications. Information management deals with the content and decision-making relationships of information as it moves through its lifecycle. To do that effectively, we must start with knowledge about the information that is to be managed.
That knowledge is contained in metadata that CommVault leverages for backup and archiving processes in what it calls its Simpana ContentStore. As part of this process, Simpana captures and creates the necessary metadata (data about the data) that is needed to manage the data. It goes beyond just file metadata (file type, file size, date created, file name, etc.), which is useful in managing the data, but insufficient.
That's because file metadata is a blunt instrument for making policy decisions (such as whether data should be deleted), so in effect if you are using file metadata alone, you are flying more than half blind. The data must be indexed, examined and placed in indices with keywords created. This adds a rich set of metadata to the file metadata that can be used for effective decision-making. For example, keywords can determine that a three-year-old email that has not been accessed since it was created should not be deleted, despite being old, for e-discovery purposes.
[Read how Sepaton's scale-out NAS aims to handle the increasing amount and diversity of data in "Sepaton Launches VirtuoSO For Data Protection."]
With this metadata, CommVault Reference Copy adds intelligence to the information management process since it enables users to build content-aware policies that can take actions automatically. The policy-based rules enable Reference Copy to classify and organize data into retention buckets. Irrelevant data can be defensively disposed of while other sets of data can be moved to more cost-effective tiers of storage or to the cloud.
Moreover, dark data that may retain some ongoing value for secondary uses, such as for analytical or big data efforts, is now more available for processing. In addition, that processing will not be encumbered with tons of irrelevant data as part of its workload.
However, to gain the benefits of CommVault's Reference copy, an organization needs to implement a strong, formal data governance process that includes business lines as well as IT needs. Determining policies and the ongoing value of data could still be a challenge, but at least the technical issues can be resolved if the business issues can be solved.
Dark data does not have the glamour and cachet of big data, but enterprises need to pay more attention to managing it. Left alone, it costs too much money while generating no value. Managed effectively, whatever dark data that appears to have ongoing value can be made available for new uses. But that which has no ongoing value can be (in the best cases) defensively disposed of or (in the worst case) moved to a more cost-effective tier of storage. The promise of CommVault’s Simpana Reference Copy is that it can facilitate management processes that involve dark data with minimal cost and complexity.
CommVault is not a client of David Hill and the Mesabi Group.