DIY Big Data Security Analytics: OpenSOC

An open source framework provides a starting point for building your own analytics system.

Natalie Timms

February 16, 2016

3 Min Read
NetworkComputing logo in a gray background | NetworkComputing

Big data analytics provides scalable, high-performance analysis of large data sets. It allows for the examination of large volumes data to discover patterns, behaviors and correlations that can be used to drive decision making. Typically focused on business applications, big data analytics is now being used for security event monitoring and threat detection.

Today there are several vendors that offer big data-centric products that consume network and device telemetry such as asset information and logs to provide users with a correlated view of their network activities  and to identify threats. Cisco Systems has one such system: Managed Threat Defense, which utilizes an open source technology known as OpenSOC  -- short for open security operations center.

OpenSOC, originally developed by Cisco, defines a DIY framework for building a real-time, big data centric analysis and storage system using parallel computational tools on a scalable Hadoop architecture. Building an analytics solution in-house is non-trivial and requires knowledge of data science and complex systems. The OpenSOC framework is a starting point for understanding how to build your solution. The technology behind OpenSOC consists of:

Telemetry capture layer: Apache Flume

Flume agents aggregate telemetry data from dynamic and static sources through the implementation of customized parsers (e.g. Syslog, Netflow, CSV files).  Each unit of data is an event moving from Source to Sink via a Channel, one per agent. Sinks identify the next step in the processing path, for example a Kafka topic.

Data bus: Apache Kafka

Kafka is a distributed messaging system partitioned into user-defined topics specific to the message types received by Producers. A Flume sink output is consumed by a Topic to provide an ordered, normalized sequence of messages that are replicated and continuously appended to a commit log in a Kafka server cluster (by Brokers). Consumers, such as Storm, subscribe to topics and process the published messages.

null

ball-958950_640.jpg

Stream processor: Apache Storm

Storm provides the ability to process streaming data in real time. Storm can consume messages from Kafka topics via Spouts that then process these messages using functions defined in Bolts to produce an event. The functions performed on each stream type are defined in a Topology. In OpenSOC, Bolts can be used to apply analytics such as machine learning or to generate enriched events by adding intelligence information.

Real-time index and search: Elastic Search

Events are moved from Storm to Elasticsearch, which indexes and stores these events allowing for real-time correlation and analytics methods like anomaly detection.

Long-term data store: Apache Hive

Storm feeds into Hive to provide data summarization and querying using an SQL-like language. For example, the storage of compressed metadata in indexed tables in ORC format, or raw data stored in tabular form. Data stored in Hive may also be queried using a MapReduce job.

Long-term packet store: Apache Hbase

HBase is a scalable and distributed database that supports structured data storage for large data sets such as PCAP tables.

Visualization platform: Kibana

Kibana is an open source data visualization platform that provides powerful graphics and the ability to build custom dashboards

Although ideal for security threat analysis, the OpenSOC framework can be tailored to ingest, analyze and view any type of telemetry for a variety of other business functions. For companies, data scientists, and anyone considering building their own dig data solution, OpenSOC is worth a look.

Interop logo

interop-las-vegas-small-logo.jpg

Learn more about infrastructure security in the Security Track at Interop Las Vegas this spring. Don't miss out! Register now for Interop, May 2-6, and receive $200 off.

About the Author

Natalie Timms

nullNatalie Timms is the former program manager with the CCIE certification team at Cisco, managing exam curriculums and content for the CCIE Security track, and was responsible for introducing Version 4.0 of the exam. Natalie has been involved with computer networking for more than 20 years, much of which was spent with Cisco in various roles: field sales specialist, product manager and software engineer. Natalie has contributed at the IETF standards level and has written many white technical papers and is also a Cisco Press author. Natalie is a US patent holder and has a CCIE Security Certification as well as a BSc in Computing Science and Statistics from Macquarie University in Sydney, Australia. Natalie moved to the US in 1995 and after meeting her husband in the local Cisco office, has called Seattle home.

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like


More Insights