Packetpig Promises Free Big Data Security Analytics

Open source tool introduced at Black Hat Europe handles terabytes of packet captures to help businesses understand if and how they've been attacked.

Mathew Schwartz

March 15, 2012

6 Min Read
Network Computing logo

Anonymous: 10 Facts About The Hacktivist Group

Anonymous: 10 Facts About The Hacktivist Group


Anonymous: 10 Facts About The Hacktivist Group (click image for larger view and for slideshow)

When information security defenses fail, what happens next?

Ideally, chief security officers (CSOs) could simply order all network traffic from around the time of the attack to be replayed and analyzed in depth by their incident response team. Except that to date, that hasn't been possible.

"The idea of full packet capture has been likened to network TiVo, and largely discounted," said Michael Baker, CTO at Packetloop, in an interview at this week's Black Hat Europe conference in Amsterdam. That's because although capturing and storing packet data is simple, no one has had the big data tools required to effectively analyze terabytes' worth of packet captures (a.k.a. pcaps).

[ How to secure a videoconferencing system. See Videoconferencing Systems Vulnerable To Hackers. ]

So Baker and two other members of a new Australian company, Packetloop, built such a tool. Dubbed Packetpig, Baker released the open-source tool on Wednesday, just minutes before delivering a related presentation, "Finding Needles In Haystacks (The Size Of Countries)" at the Amsterdam conference.

"I built the product to answer two questions: Am I overly targeted, and am I vulnerable?" said Baker. "This is what CSOs would love to know: How many attacks am I getting? How do I compare with other CSOs? If you have an attacks-per-hour ratio, that's the sort of thing you could give away, because it's an attack thing, it's not about how secure you are." Furthermore, many CSOs don't want perfect security, said Baker. They just want to be better enough than their peers that attackers will prefer the easier target.

Baker said one challenge with analyzing large sets of data traditionally has been that as pcaps age, tools tend to aggregate the data, meaning that a minute-by-minute look at packets becomes averaged into an hour, and hours later rolled up into days or even months. "The key point is to not lose fidelity," he said. But that's only possible by doing full packet capture, and then analyzing all of those packets for a desired timeframe.

Enter Packetpig, which is designed to analyze packets' IP headers, protocols, and conversations and flows, as well as to handle threat analysis, geo-location, operating system fingerprinting, and file dissection across large sets of data.

Packetpig is built on Pig, which is a platform--programmed in a language called Pig Latin--for creating MapReduce jobs, a concept Google outlined in a 2004 research paper. The jobs spread problems involving large amounts of data across multiple nodes. In particular, Packetpig is a series of data-analysis jobs that run on Hadoop, an open-source implementation of MapReduce, to handle the replication of data across multiple nodes. These nodes could be anything from spare servers or compute time scrounged by the information security group to Amazon Simple Storage Service (Amazon S3).

Packetpig offers "loaders" that extract pcap data from Argus, Bro, Flowgrep, Network Miner, Sguil, and Snort. "They're generally used like electron microscopes, as isolated tools [to analyze] very small packet captures," said Baker. But visualizing all pcaps over a long period of time helps an incident response team spot historical attacks that it might have missed when they were unfolding.

"This is helpful for finding zero-day threats in old data," he said. "Remember, Snort is a signature-based system, mostly, so as I update my signatures, I can go back and find zero days in data." In addition, the bigger the data set--meaning, the more pcaps analyzed, over a long period of time--the easier it is to calculate background noise, which makes unusual activity easier to spot. "Let's say you find something interesting, how do you go back and dump it, saying, 'Show me everything from that box for the past three months that got sent to China, or Iran'?"For example, after an attack against a Sydney-based company that Baker worked with, he studied a 2.5 TB set of pcaps that had been gathered over a two-week period during which the attack occurred. The data involved 3 billion packets, from which 420,000 security events had been analyzed, involving 1,890 different sources of attack, meaning they had unique IP addresses.

In his Black Hat Europe presentation, Baker relayed the Packetpig results to a map of the globe, using Google WebGL Globe, from which lines emanated, indicating attack severity (with severity from green to red) and frequency (height). "When we try to visualize big data sets, it's important to let the brain explore," said Baker.

Two immediately obvious hotspots were in South Africa and Australia. Accordingly, Baker decided to triangulate his data, to provide more insights into the attackers. So Baker and his team wrote a script that crawled Torrent search site The Pirate Bay, which tracks the IP addresses of all seeders and leechers associated with a specific torrent file, and he captured two weeks' worth of that information for the top 100 torrents.

He made an interesting discovery: "Seventeen IP addresses matched the attacks and the torrents, which I was very surprised by. We never thought we'd get that," he said.

Another surprise was attackers' choice in movies, which included 7 Weeks To 100 Pushups as well as the 2011 remake of Footloose. Those two specific movie downloads, in fact, pointed to one user, who'd downloaded parts of each movie via two different IP addresses. Likewise, a single download of The Adventures of Tintin matched another two IP addresses.

No one, however, had anonymized their attacks, which they could do by routing their traffic via the Tor network, for example. "Not one single Tor address was linked to an attack against that data set, which is pretty interesting; I would have probably expected at least one," said Baker.

Based on the analysis, what did Baker learn? Primarily, that he didn't need to take action, and he was able to reach that conclusion quite quickly, after ascertaining that both of the attacks he reviewed appeared to have been the result of PCs that were infected with misbehaving browser-bar software called iWon.

"A lot of times in security analysis, you just want to get over it quickly--in the sense of I want to understand everything about that attack, and move on," he said. But if he'd uncovered evidence of a larger-scale attack--such as an advanced persistent threat, then he could have initiated an incident-response program to clean up the attack and taken steps to prevent a recurrence.

Packetpig has some limits, including performance issues as the amount of data to be analyzed increases, which Baker said have to do with the architectural limits of Pig. Within the next couple of months, however, his company plans to introduce a cloud-based service--and later, an on-premise tool--for analyzing pcaps, for example by allowing users to run them through cloud-based intrusion detection and prevention systems. The tool, Packetloop, which is now in beta (and not written in Pig) will offer better performance, he said, as well as better reports, more statistical analysis capabilities, and machine learning capabilities to give incident-response teams greater insights into suspicious network traffic.

SSL is widely deployed, yet enterprises still struggle to manage it and ensure its effectiveness. Companies must understand the threats, know how to use SSL internally, and assure it functions properly and protects their data. In our SSL Authentication report, we show you how to address the security and operational issues inherent in creating and managing internal SSL certificate authorities. (Free registration required.)

About the Author(s)

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like


More Insights