Collecting information about network events has long been essential to providing a record of activities related to requirements such as accounting, billing, compliance, SLAs, and forensics. Systems and applications provide data in standardized forms such as syslog, as well as vendor-specific formats. These outputs are then analyzed to provide a starting point for business-related planning, security breach identification and remediation, and many other projects.
Deriving correlated, meaningful, high-fidelity outputs from large amounts of log and event data sources has seeded the need for data analytics. In this blog, I'll provide a basic overview of common network analytics methods and the type of outputs they yield.
Deterministic analysis uses known formulas and behaviors applied as rule-based criteria to evaluate inputs such as packet headers and data. For example, protocols behave in an RFC-defined fashion; they use well-known ports and follow a known set of flows. When establishing a TCP connection, we look for the three-way handshake. When applying stateful firewalling, certain characteristics are expected, such as TCP flags and sequence numbers.
Deterministic analysis assumes that all inputs and parameters are known and anything that does not fall within the boundaries of compliance is a potential issue. There is no randomness introduced in the evaluation criteria; this type of analysis is proactive in nature, allowing for the detection of known attack patterns.
Heuristic analysis, like deterministic analysis, uses a rule-based approach to detect behaviors; however, these rules are in the form of functions that are applied to inputs in a reactive manner. Heuristics are often used to detect events in the form of a rule-based engine that can perform functions on data in real time. A heuristic engine goes beyond looking for known patterns by sandboxing a file and executing embedded instructions. It can also examine constructs such as processes and structures in memory, metadata, and the payload of packets.
The advantage of heuristic analysis is detection of variants of known attacks as well as potential zero-day attacks. Heuristics are often combined with other techniques, such as signature detection and reputation analysis, to increase the fidelity of results.
Statistical analysis is often used in anomaly detection with the goal of identifying some traffic parameters that vary significantly from the normal behavior or “baseline.” Statistical analysis uses formulas applied to data; results are compared to the baseline data.
There are two main classes of statistics procedures for data analysis and anomaly detection. The first class is based on applying thresholds to individual data points. There is some expected level of variation between telemetry information and baseline; however, any deviation beyond certain thresholds is defined as anomalous. The second class involves measuring the changes in distribution by windowing the data and counting the number of events or data points to determine anomalies.
Big data analytics refers to the process of collecting, correlating, organizing and analyzing large sets of data to discover patterns and other useful information. The sheer volume of data and the different formats of the data (structured and unstructured) collected across multiple telemetry sources is what characterize “big data.”
Data that resides in a fixed field within a record or file is called structured data. This includes data contained in relational databases and spreadsheets and is often managed using SQL-like queries. Unstructured data such as webpages, PDF files, PowerPoint presentations and emails is not easily classified. Oftentimes, metadata is used to provide structure to these elements. For example, emails have the sender, recipient, date, time and other fixed fields added to the unstructured data of the email message content and any attachments.
The large volume of data is then analyzed using specialized systems and applications for predictive analytics, data mining, forecasting, and optimization.
The amount of data that needs to be analyzed and the variations on that data in terms of source and format will drive the type of analytics you want to deploy. Also, it's not uncommon for organizations to use a combination of analytical methods that provide proactive, reactive, behavioral, and contextual actionable outcomes.