Big Data Analytics, Cloud Services, And IT Security

Enterprise IT security teams struggle to filter gigabytes of log data, but a new generation of cloud-based log aggregation and analysis services promises to help.

Kurt Marko

December 3, 2013

5 Min Read
Network Computing logo

IT is struggling with the digital version of a problem that's plagued video surveillance admins for years: how to make sense of a constant, fast-moving stream of information and sift out the few important events. The combination of highly scalable cloud services and big data analytics can help tackle this IT security monitoring problem.

Indeed, cloud services are pioneering improvements in the packaging and management of log data of all types, not just security logs, as evidenced by Amazon's recent introduction of AWS CloudTrail, a service providing a consolidated record of all activity against an AWS account.

Since it's not always clear who or what is using a particular set of AWS resources, CloudTrail keeps a record of all API calls to an account, including those made via the AWS management console, SDKs, command-line tools or higher-level services such as CloudFormation, Elastic Beanstalk or OpsWorks. CloudTrail can even be configured to aggregate logs from multiple accounts, writing them in JSON format for easy parsing and processing into an S3 bucket in near real time, typically within 15 minutes of the event. So what does this have to do with security? Plenty.

As this AWS blog describing CloudTrail points out, such detailed API logging facilitates improvements to several important IT processes, including:

●Compliance enforcement by providing the information needed to demonstrate that AWS resources were managed according to rules and regulatory standards.

●Resource tracking through the entire lifecycle from creation through deletion.

●Operational troubleshooting to identify the most recent changes made to all resources in an AWS environment.

●Security analytics to identify which user activities failed due to inadequate permissions.

As Amazon notes, having consolidated and accurate log data can provide timely answers to questions such as: "What actions did a given user take over a given time period? For a given resource, which user has taken actions on it over a given time period? What is the source IP address of a given activity? Which activities failed due to inadequate permissions?"

Sounds nice in theory, but how? Someone still has to write the queries and generate reports. That's where a new generation of big data analytics cloud services from companies such as Logentries, Loggly, Papertrail, Splunk and Sumo Logic come in, several of which have already added support for CloudTrail.

For example, the Splunk App for AWS includes a set of pre-built dashboards and reports that improve understanding of user behavior, document compliance with security and regulatory standards, including the PCI Data Security Standard, and generate usage-based billing summaries. Similarly, Sumo Logic's Application for CloudTrail can be used for security analytics by filtering data in real time to study user behaviors and alert on unusual patterns.

Sumo Logic also can correlate CloudTrail data with that from other systems, including OS security logs, host based IDSs, or non-AWS application logs. In fact, according to the firm's CMO, Sanjay Sarathy, its technology processes virtually any data set with a timestamp, using a data reduction scheme that consolidates thousands of raw entries into a tiny fraction of relevant records.

Likewise, Logentries is beta testing a product called OpsStream that collects data from several AWS services, including EC2, EBS, RDS, and CloudWatch into a single Logentries account that incorporates system and application logs from other systems, thus providing a single dashboard summarizing the health of applications using multiple AWS and on-premise resources. OpsStream also will soon collect data from CloudTrail and AWS Trusted Advisor.

But applying big data techniques to security logs means doing much more than basic statistical analysis to flag outliers. The real benefits come through applying pattern recognition and self-learning to automatically build baseline norms for a particular data set and refine the detection algorithms based on how IT admins handle each alert.

[Read why organizations need to spend more time on proper security design in "Security Needs To Focus On Architecture, Not Products."]

Indeed, Sumo Logic's Sarathy claims the company's pattern recognition engine is one of the key pieces of intellectual property inside its product. He says most existing data analysis products require manual input of specific search terms -- an ad hoc process that's not suited to the overwhelming volume of newly generated data. Sumo Logic's approach automatically builds baseline patterns used to flag potentially anomalous events that admins can classify as genuine threats, suspicious but not serious or innocuous false alarms.

This feedback is used to tweak the system's anomaly detection algorithm, gradually improving accuracy and reducing the number of false positives. "The focus is on proactive action, not reactive response," Sarathy told Network Computing.

As I wrote in an InformationWeek report on the application of NSA-developed big data technology to IT, real time filtering and analysis of colossal log files was a key ingredient in the NSA's PRISM eavesdropping program. According to Ely Kahn, a co-developer of the NSA's Accumulo software, traditional security information and event management (SIEM) and forensics software can’t process the volume, diversity and complexity of data in the ways required to identify new threats or predict incidents.

He noted that a financial institution might generate terabytes of security data from hundreds of platforms per day and want to store years’ worth in order to draw long-term correlations. From this petabyte-scale pool, the institution needs to perform low-latency searches, getting results within a of couple seconds. Sqrrl, Kahn's spin-out company commercializing the open sourced NSA software, sees it as a problem tailor made for big data analytics.

Sprrl's Hadoop-based technology targets large customers with on-prem installations, but is perfectly suited for a cloud service and could serve as the backend for future cloud services. Sqrrl could be a potential arms merchant for companies like Sumo Logic or its future competitors.

Corporate security managers stand to benefit from leveraging big data technology, but also from its instantiation as cloud services with the accompanying benefits of hyperscale, multitenant number-crunching, and SaaS deployment convenience.

About the Author(s)

Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like

More Insights