Special Coverage Series

Network Computing

Special Coverage Series

Commentary

David Hill
David Hill Network Computing Blogger

CommVault Tackles The Dark Data Problem

CommVault expands its Simpana platform with software that promises to help enterprises get a handle on unused and unnecessary data.

An unpleasant byproduct of the inexorable data explosion is that the storage repositories in enterprises are bulging with neglected, unwanted and unneeded data that has no value. More importantly, the problem with what some call “dark data” is going to get worse. Reference Copy, recently added to CommVault's Simpana software platform, illustrates one way in which to address this problem.

First, let's take a closer look at dark data. In its IT Glossary, Gartner defines dark data “as the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing).” Gartner also notes, “Dark data often comprise most organizations’ universe of information assets ... Storing and securing data typically incurs more expense (and sometimes greater risk) than value.”

More Insights

Webcasts

More >>

White Papers

More >>

Reports

More >>

Let’s also consider the magnitude of the impact that no-value data has in the enterprise. In 2012, a survey presented at the Compliance, Governance and Oversight Counsel Summit revealed that 1% of data in an enterprise has to be preserved for litigation hold, 5% has to be managed to cover compliance requirements, and another 25% is reasonably determined to have current business value. That means 69% of all data has no value whatsoever.

Now, there is not necessarily a 1:1 mapping between dark data and no-value data, but practically speaking, all data of no value can be subsumed as part of dark data (where the rest of dark data may still be able to serve some business purpose). As part of the data-driven enlightenment process that wants to see if some neglected or currently-used data really does have some value (such as big data and driving analytics insights), we don’t want to arbitrarily confine all of dark data to IT storage purgatory.

Still, the clock is ticking and long-term retention of dark data that has no value continues to incur cost (if not extra risk, as well). If storage is 40% of the IT hardware budget (a statistic I saw in a recent vendor presentation) and 69% of data has no value, then more than a quarter of the IT hardware budget each year serves no purpose.

Now, of course, it is unrealistic to think that all of the no-value data problems can be solved, but even if a third or a half of the potential savings could be achieved, that would be valuable.

So how does one go about defensible disposal of such data, elimination of redundant copies and more cost-effective tiering to alleviate the negative impact of excessive useless data? Trying to do everything manually is too labor intensive even if you could find, access, and view all the data that needs to be managed. Here is where CommVault's Simpana Reference Copy promises to help.

Simpana Reference Copy

CommVault’s Simpana software is a single platform designed to protect, manage, and access enterprise data. Many view it as a backup and recovery software product, but it can do more, such as archiving. Simpana really is a “broad spectrum” information management platform that can apply to a wide range of applications. Information management deals with the content and decision-making relationships of information as it moves through its lifecycle. To do that effectively, we must start with knowledge about the information that is to be managed.

That knowledge is contained in metadata that CommVault leverages for backup and archiving processes in what it calls its Simpana ContentStore. As part of this process, Simpana captures and creates the necessary metadata (data about the data) that is needed to manage the data. It goes beyond just file metadata (file type, file size, date created, file name, etc.), which is useful in managing the data, but insufficient.

That's because file metadata is a blunt instrument for making policy decisions (such as whether data should be deleted), so in effect if you are using file metadata alone, you are flying more than half blind. The data must be indexed, examined and placed in indices with keywords created. This adds a rich set of metadata to the file metadata that can be used for effective decision-making. For example, keywords can determine that a three-year-old email that has not been accessed since it was created should not be deleted, despite being old, for e-discovery purposes.

[Read how Sepaton's scale-out NAS aims to handle the increasing amount and diversity of data in "Sepaton Launches VirtuoSO For Data Protection."]

With this metadata, CommVault Reference Copy adds intelligence to the information management process since it enables users to build content-aware policies that can take actions automatically. The policy-based rules enable Reference Copy to classify and organize data into retention buckets. Irrelevant data can be defensively disposed of while other sets of data can be moved to more cost-effective tiers of storage or to the cloud.

Moreover, dark data that may retain some ongoing value for secondary uses, such as for analytical or big data efforts, is now more available for processing. In addition, that processing will not be encumbered with tons of irrelevant data as part of its workload.

However, to gain the benefits of CommVault's Reference copy, an organization needs to implement a strong, formal data governance process that includes business lines as well as IT needs. Determining policies and the ongoing value of data could still be a challenge, but at least the technical issues can be resolved if the business issues can be solved.

Mesabi Musings

Dark data does not have the glamour and cachet of big data, but enterprises need to pay more attention to managing it. Left alone, it costs too much money while generating no value. Managed effectively, whatever dark data that appears to have ongoing value can be made available for new uses. But that which has no ongoing value can be (in the best cases) defensively disposed of or (in the worst case) moved to a more cost-effective tier of storage. The promise of CommVault’s Simpana Reference Copy is that it can facilitate management processes that involve dark data with minimal cost and complexity.

CommVault is not a client of David Hill and the Mesabi Group.



Related Reading



Network Computing encourages readers to engage in spirited, healthy debate, including taking us to task. However, Network Computing moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. Network Computing further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | Please read our commenting policy.
 

Editor's Choice

Research: 2014 State of Server Technology

Research: 2014 State of Server Technology

Buying power and influence are rapidly shifting to service providers. Where does that leave enterprise IT? Not at the cutting edge, thatís for sure: Only 19% are increasing both the number and capability of servers, budgets are level or down for 60% and just 12% are using new micro technology.
Get full survey results now! »

Vendor Turf Wars

Vendor Turf Wars

The enterprise tech market used to be an orderly place, where vendors had clearly defined markets. No more. Driven both by increasing complexity and Wall Street demands for growth, big vendors are duking it out for primacy -- and refusing to work together for IT's benefit. Must we now pick a side, or is neutrality an option?
Get the Digital Issue »

WEBCAST: Software Defined Networking (SDN) First Steps

WEBCAST: Software Defined Networking (SDN) First Steps


Software defined networking encompasses several emerging technologies that bring programmable interfaces to data center networks and promise to make networks more observable and automated, as well as better suited to the specific needs of large virtualized data centers. Attend this webcast to learn the overall concept of SDN and its benefits, describe the different conceptual approaches to SDN, and examine the various technologies, both proprietary and open source, that are emerging.
Register Today »

Related Content

From Our Sponsor

How Data Center Infrastructure Management Software Improves Planning and Cuts Operational Cost

How Data Center Infrastructure Management Software Improves Planning and Cuts Operational Cost

Business executives are challenging their IT staffs to convert data centers from cost centers into producers of business value. Data centers can make a significant impact to the bottom line by enabling the business to respond more quickly to market demands. This paper demonstrates, through a series of examples, how data center infrastructure management software tools can simplify operational processes, cut costs, and speed up information delivery.

Impact of Hot and Cold Aisle Containment on Data Center Temperature and Efficiency

Impact of Hot and Cold Aisle Containment on Data Center Temperature and Efficiency

Both hot-air and cold-air containment can improve the predictability and efficiency of traditional data center cooling systems. While both approaches minimize the mixing of hot and cold air, there are practical differences in implementation and operation that have significant consequences on work environment conditions, PUE, and economizer mode hours. The choice of hot-aisle containment over cold-aisle containment can save 43% in annual cooling system energy cost, corresponding to a 15% reduction in annualized PUE. This paper examines both methodologies and highlights the reasons why hot-aisle containment emerges as the preferred best practice for new data centers.

Monitoring Physical Threats in the Data Center

Monitoring Physical Threats in the Data Center

Traditional methodologies for monitoring the data center environment are no longer sufficient. With technologies such as blade servers driving up cooling demands and regulations such as Sarbanes-Oxley driving up data security requirements, the physical environment in the data center must be watched more closely. While well understood protocols exist for monitoring physical devices such as UPS systems, computer room air conditioners, and fire suppression systems, there is a class of distributed monitoring points that is often ignored. This paper describes this class of threats, suggests approaches to deploying monitoring devices, and provides best practices in leveraging the collected data to reduce downtime.

Cooling Strategies for Ultra-High Density Racks and Blade Servers

Cooling Strategies for Ultra-High Density Racks and Blade Servers

Rack power of 10 kW per rack or more can result from the deployment of high density information technology equipment such as blade servers. This creates difficult cooling challenges in a data center environment where the industry average rack power consumption is under 2 kW. Five strategies for deploying ultra-high power racks are described, covering practical solutions for both new and existing data centers.

Power and Cooling Capacity Management for Data Centers

Power and Cooling Capacity Management for Data Centers

High density IT equipment stresses the power density capability of modern data centers. Installation and unmanaged proliferation of this equipment can lead to unexpected problems with power and cooling infrastructure including overheating, overloads, and loss of redundancy. The ability to measure and predict power and cooling capability at the rack enclosure level is required to ensure predictable performance and optimize use of the physical infrastructure resource. This paper describes the principles for achieving power and cooling capacity management.