Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Big Data Security: The Balance Is Shifting in Favor of IT

Unstructured data can not only be cumbersome to manage, but it's also a potential cybersecurity threat for enterprises struggling to get a handle on what's referred to as "big data."

Enterprises are processing, storing and mining increasing amounts of data in unstructured formats--from email, documents, videos and audio files, to IM messages, digital images and Web logs.

This data can contain intellectual property or sensitive, regulated information on a corporation's employees and customers, says Todd Thiemann, co-chair of the Cloud Security Alliance's Solution Provider Advisory Council and senior director of product marketing at San Jose, Calif.-based Vormetric, an encryption, key management and data security firm.

"It's a variety of different data sources coming from Web traffic, large data stores and what have you. You're dealing with larger volumes of data and velocity--the ability to make decisions more quickly off of that data," he says. "You can look at this in a couple of different dimensions; one of them is using security data--all the log data that might accumulate across the enterprise to make rapid decisions. But that's not what I'm talking about in the context of big data."

As enterprises are looking at frameworks like Hadoop or NoSQL databases like MongoDB or Apache Cassandra, often these databases include sensitive data. IT needs to consider how it's going to secure that data and control access to it.

Thiemann presented his thoughts on the subject recently at the Gartner Security and Risk Management Summit with his presentation, "Big Breaches, Big Compliance, Big Data, Big Encryption."

"Typically, there's a tradeoff between availability of data for knowledge workers and the confidentiality of that data. If you want more availability, you usually have less security and confidentiality," he says. "Or if it's sensitive stuff and you need more confidentiality around it, availability is going to suffer and your knowledge workers won't be able to access it as rapidly.

"With some of the innovations we've seen of late, that balance has been shifting. So IT is able to maintain an improved balance with knowledge workers accessing the data while maintaining the confidentiality around it."

Some of the reasons for that are technologies such as Intel's microprocessors, which have integrated cryptographic instructions that were previously in software into hardware.

"The Advanced Encryption Standard is one of the industry standards for an encryption algorithm. Intel has put it in their microprocessors, AES-NI is what they call it," he continues. "The performance overhead for encryption, which previously was quite significant, has come down to near zero. There are benchmarks out there in the not-too-distant past that show in a high workload environment you're talking 28% overhead for encryption."

Vormetric benchmarked with Intel and IBM an online transaction processing workload with encryption, and found for typical workloads it was on the order of sub-2% for encryption.

"There's this perception in the marketplace that if I encrypt my data there's a big performance impact and that's prohibitive, and the availability might not be there for my knowledge workers," says Thiemann. "AES-NI also applies in the cloud where it allows you to stuff more data up there, crunch it faster and get business results while taking advantage of all the cloud goodness in terms of lowering cost."

Big data typically refers to what's classified as '"semi-structured" and "unstructured" data that is difficult or impossible to incorporate in a conventional relational database, says Charles King, principal analyst at Pund-IT.

More importantly, these sources make up a larger portion of the data that organizations create and save--some estimate as much as 80% of a typical company's information resources--and it's growing at a significantly higher rate than common database information.

"Since so much of organizations' IT investment is going toward managing and maintaining big data; it's important to leverage those resources as effectively as possible," he says. "Further care should be taken if that information is collected into specialized repositories. Plus, of course, the findings from big data mining and analysis should be carefully sequestered and secured."

In terms of how IT should approach securing big data, Thiemann says it is important to understand what the sensitive data is in your enterprise and where that data resides. Then, how will you take steps to secure that data? You need to have compliance, but it's baseline. Typically compliance doesn't equal the necessary security for your data.

While the definition of "sensitive" data will vary among organizations, the definition in general has expanded.

"There's the baseline that it's cardholder data for credit card information, or maybe it's health records or a national Social Security number," says Thiemann. "What we've seen over time in light of some recent data breaches is that definition is expanding."

And what of the insider threat? To what extent is the threat of internal data leakage a concern when it comes to big data? Though most tend to focus on threats posed by external threats, security pros will often tell you that many--if not most--data breaches occur on the inside, when employees attempt to access information they should not, either erroneously or with intent.

"Key management, access logs and other solutions are designed to prevent such breaches," says King. "At this point, we're still in the early days of big data, but as these resources become more common, I expect security vendors to develop solutions to address specific big data problems and scenarios."