Special Coverage Series

Network Computing

Special Coverage Series


Can Big Data Be Poisoned by Botnets?

The Chameleon botnet is generating billions of ad impressions in a click-fraud scheme. It's a red flag for companies making business decisions based on analysis of Web data.

A small botnet working covertly from 120,000 American households is generating so much fake Web traffic that it's having a significant impact on the online display advertising industry's overall revenues for the first time, according to the U.K.-based Web analytics company that discovered it.

Web traffic analyses from Spider.io shows that the Chameleon botnet uses its base of bots to blast page requests to more than 200 advertising-supported Web sites.

More Insights

Webcasts

More >>

White Papers

More >>

Reports

More >>

Each hit raises the total volume of traffic to target sites, which make an average of 69 cents per thousand impressions served. The botnet generates more than 9 billion fake ad impressions per month, accounting for "at least" 65% of the traffic on targeted sites at a cost to advertisers of $6.2 million per month," according to Spider.io.

That is vastly higher than the amount of click fraud on most sites, which averages about 0.16%, according to a March 2012 report from digital marketing consultancy comScore.

Why is all this important to people who run data centers? Because Chameleon is a red flag to any company that analyzes Web traffic to identify the interests and activities of its customers, and to make business decisions based on those analyses.

Squirreling those anomalous results into Web-traffic data and, potentially, other streams of information such as the M2M feeds that make up the "Internet of things" could affect the integrity and credibility of results based on that data--especially for companies that pride themselves on the quality of their data, the sophistication of their analytics and the reliability of the projections they use to plan their future businesses.

Chameleon also represents a step up in sophistication of click-fraud schemes, both in its ability to camouflage itself and in the amount of money it generates.

For instance, a separate botnet, the Bamital network, included as many as 1.8 million PCs that delivered an average of 3 million fake clicks a day on specifically designated ads on search sites. But it was responsible for only about $1 million worth of fraudulent ad clicks, according to the U.K.'s Guardian newspaper.

By contrast, advertisers paid ghost sites $6 million per month as part of the Chameleon scheme, and the Chameleon botnet is just one-fifteenth the size of Bamital. This is a major step up in the effectiveness and sophistication of fraud based on high volumes of fake traffic.

Chameleon also takes steps to make its fake traffic look real. Display ads are posted according to the decision of algorithms created by ad network owners to look for the ideal audience. Ad network owners do try to detect, where possible, anomalies in a website's traffic that could indicate click fraud by botnets or other means.

Chameleon bots don't just send page requests; they also generate click traces that make it look as if a user is actually clicking on links, rather than just sending page requests. Those fake mouse clicks generate a 0.02% click through rate, and paint mouse traces on 11% of all the fake ad impressions the bots generate.

Chameleon isn't that smart, though. The 120,000 bots all execute JavaScript, hit the same set of websites, generate uniform click coordinates on the ads they "click," and all report their host machines are running Internet Explorer 9.0 on top of Windows 7.

Spider.io posted a list of 5,000 IP addresses that can be pasted into blacklists to block the worst of the Chameleon bots.

The 202 sites that benefited from Chameleon visits are mostly so-called "ghost sites," whose URLs look like ordinary consumer sites but that contain minimal content and are owned by ad an network called AlphaBird, which may or may not have known about the volume of fraud, according to PaidContent. PaidContent is a media blog owned by GigaOM.

Botnets have plagued data center operators for years as a source of spam and DDoS attacks. This latest twist has implications for data analysis. Garbage in, garbage out--remember? The first thing programmers have to learn is how to keep the garbage out of their data and their results. Chameleon shows garbage is getting a lot sneakier about getting into data that looks perfectly good.



Related Reading



Network Computing encourages readers to engage in spirited, healthy debate, including taking us to task. However, Network Computing moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. Network Computing further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | Please read our commenting policy.
 

Editor's Choice

RESEARCH: 2013 Backup Technologies Survey

RESEARCH: 2013 Backup Technologies Survey

Think backups are boring? Not so, say more than 500 IT pros. Most, 60%, use two, three or even more different backup applications, and the percentage encrypting all media has jumped 15 points since 2011.
Get full survey results now! »

Digital Issue: The Standardization Debate

Digital Issue: The Standardization Debate

An IT infrastructure constructed from uniform blocks of hardware and software is easier to manage and secure, and new services can be rolled out fast. But giving business units carte blanche can deliver more flexibility, drive innovation and better meets employee needs. Two IT executives square off in this debate, and almost 400 survey respondents weigh in too.
Get the Digital Issue »

WEBCAST: Avoiding Downtime: How Virtualization Can Help In Times of Trouble

WEBCAST: Avoiding Downtime: How Virtualization Can Help In Times of Trouble

Server and storage virtualization can help keep systems alive even in the face of demand spikes, disasters and other troubles. Attend this webcast to learn how virtualization can maximize application availability, create business continuity options for critical apps, and improve disaster recovery.
Register Today »

Related Content

From Our Sponsor

Implementing Energy Efficient Data Centers

Implementing Energy Efficient Data Centers

Electrical power costs over the life of a data center may exceed the initial cost of the IT equipment. As described in this paper, recognizing the appropriate IT design architecture necessary and being able to quantify the potential electrical savings can significantly increase cost savings over time.

Creating Order from Chaos in Data Centers and Server Rooms

Creating Order from Chaos in Data Centers and Server Rooms

IT Professionals who are challenged with managing a chaotic data center - messy racks, sub-standard floor air distribution and cable sprawl - can now leverage innovative methods for dealing with and eliminating the root causes of disorder. This paper outlines the solutions available to help create an organized data center.

High-Efficiency AC Power Distribution for Green Data Centers

High-Efficiency AC Power Distribution for Green Data Centers

In order to create optimal electrical efficiency and simplified data centers, the use of 240 volt power distribution is highly recommended. This paper describes the various configurations for this distribution architecture as well as the quantified benefits. Note: Applicable to North America only.

Energy Efficient Cooling for Data Centers: A Close-Coupled Row Solution

Energy Efficient Cooling for Data Centers: A Close-Coupled Row Solution

The trend of increased heat densities in data centers has held consistent with advances in computing technology. As power density increased, the degree of difficulty in cooling these higher power loads was also increasing. This article discusses the efficiency benefits of row-based cooling compared to two other common cooling architectures.

Data Center Projects: Standardized Process

Data Center Projects: Standardized Process

As the design and deployment of data centers evolve into more complicated projects, the benefits of a standardized and predictable process are compelling. This paper presents an overview of a standardized, step-by-step process methodology that can be adapted and configured to suit individual requirements, thus reducing costs and eliminating waste.