Big data analytics provides scalable, high-performance analysis of large data sets. It allows for the examination of large volumes data to discover patterns, behaviors and correlations that can be used to drive decision making. Typically focused on business applications, big data analytics is now being used for security event monitoring and threat detection.
Today there are several vendors that offer big data-centric products that consume network and device telemetry such as asset information and logs to provide users with a correlated view of their network activities and to identify threats. Cisco Systems has one such system: Managed Threat Defense, which utilizes an open source technology known as OpenSOC -- short for open security operations center.
OpenSOC, originally developed by Cisco, defines a DIY framework for building a real-time, big data centric analysis and storage system using parallel computational tools on a scalable Hadoop architecture. Building an analytics solution in-house is non-trivial and requires knowledge of data science and complex systems. The OpenSOC framework is a starting point for understanding how to build your solution. The technology behind OpenSOC consists of:
Telemetry capture layer: Apache Flume
Flume agents aggregate telemetry data from dynamic and static sources through the implementation of customized parsers (e.g. Syslog, Netflow, CSV files). Each unit of data is an event moving from Source to Sink via a Channel, one per agent. Sinks identify the next step in the processing path, for example a Kafka topic.
Data bus: Apache Kafka
Kafka is a distributed messaging system partitioned into user-defined topics specific to the message types received by Producers. A Flume sink output is consumed by a Topic to provide an ordered, normalized sequence of messages that are replicated and continuously appended to a commit log in a Kafka server cluster (by Brokers). Consumers, such as Storm, subscribe to topics and process the published messages.
Stream processor: Apache Storm
Storm provides the ability to process streaming data in real time. Storm can consume messages from Kafka topics via Spouts that then process these messages using functions defined in Bolts to produce an event. The functions performed on each stream type are defined in a Topology. In OpenSOC, Bolts can be used to apply analytics such as machine learning or to generate enriched events by adding intelligence information.
Real-time index and search: Elastic Search
Events are moved from Storm to Elasticsearch, which indexes and stores these events allowing for real-time correlation and analytics methods like anomaly detection.
Long-term data store: Apache Hive
Storm feeds into Hive to provide data summarization and querying using an SQL-like language. For example, the storage of compressed metadata in indexed tables in ORC format, or raw data stored in tabular form. Data stored in Hive may also be queried using a MapReduce job.
Long-term packet store: Apache Hbase
HBase is a scalable and distributed database that supports structured data storage for large data sets such as PCAP tables.
Visualization platform: Kibana
Kibana is an open source data visualization platform that provides powerful graphics and the ability to build custom dashboards
Although ideal for security threat analysis, the OpenSOC framework can be tailored to ingest, analyze and view any type of telemetry for a variety of other business functions. For companies, data scientists, and anyone considering building their own dig data solution, OpenSOC is worth a look.
Learn more about infrastructure security in the Security Track at Interop Las Vegas this spring. Don't miss out! Register now for Interop, May 2-6, and receive $200 off.