Cloud Infrastructure

10:06 AM
Connect Directly
Twitter
RSS
E-Mail
50%
50%

How One Amazon Cloud Customer Gains Operations Visibility

Wattpad uses a combination of three monitoring systems to get the insight it needs to keep its busy, AWS-powered website humming.

Wattpad is a website that brings together authors and readers. First-time authors, or for that matter, many-time authors, can post their stories to the site and gather critical feedback that may help in the revision process. Readers can follow favorite authors and access more than 10 million stories at no charge using a web browser or mobile device.

Wattpad has been endorsed by noted Canadian author Margaret Atwood and grown to 10 million visitors a month. Each second it fields 6,000 requests for content or requests to post comments. That means there can be few kinks in the way the site operates. Wattpad's owners want visitors to find what they're looking for -- many have favorite authors -- and get their requested stories to their devices quickly.

Wattpad, from its inception, has run on Amazon Web Services infrastructure, but its operators have found it hard to gain as much visibility as they'd like about website applications. Amazon's CloudWatch provides basic feedback, which can be amped up by adding for-a-fee reporting metrics. Wattpad has used CloudWatch, but added to it three independent monitoring systems: New Relic, Datadog, and Boundary.

[Want to learn more about what New Relic does for cloud applications? See Obama's Developer Brain Trust: Inside The Big Battle.]

In the second half of 2013, Charles Chan, head of engineering at Wattpad, wanted to know what was wrong when the site's search engine, Elasticsearch, developed signs of slowing down. Wattpad visitors may be absent for a few days or weeks, and when they return they want a quick compilation of any new postings by their favorite authors. Elasticsearch is a young search engine based on Lucene open-source code, designed to gather frequently updated documents. Its first, stable 1.0 release was released on February 12, 2014, so Chan carefully monitored the earlier versions used through 2013.

In the latter half of 2013, Wattpad's monitoring system showed many user requests flowing into Elasticsearch, but significantly less information than expected coming out. The search engine slowdown would have been hard to detect with AWS's CloudWatch monitoring service, which lacks a metric that reflects Elasticsearch's operation. But independent monitoring service Boundary, which detects traffic levels between the nodes of a system running on Amazon, helped Chan identify the problem.

(Image: Boundary)
(Image: Boundary)

Boundary's dashboard alerted Chan that Elasticsearch output was slowing down because it could spot the disproportionate amount of data coming out of the Elasticsearch node compared to the traffic going in. Chan corrected the problem by reconfiguring Elasticsearch to better fit Wattpad's usage patterns.

Chan said Wattpad has used Amazon's EC2 from its start, but it added Boundary six months ago to its New Relic application performance monitoring and its Datadog system, which compiles monitoring data into a unified display. He doesn't rely on Amazon's CloudWatch much anymore. "It doesn't give the same level of insights" as Boundary and New Relic, he said.

New Relic APM is useful in spotting application slowdowns, such as a hung application waiting for results from a satellite database system, and other potential trouble points. Boundary brings something else to the party: an ability to see what network traffic is feeding into those applications and the traffic coming out. Unlike traditional systems management, which tells you whether your servers and network switches are operating normally, Boundary watches the network segments between the nodes.

"Not all issues will be manifested at the application level," said Chan in an interview. He uses its overview of network bandwidth use and network traffic to spot potential trouble points. Wattpad conducted testing in the lead-up to its busy holiday season by firing off artificial demand against its website and observing, through Boundary, where traffic flowed smoothly and where it began to back up.

One trouble spot was the open-source Memcache caching system supplying frequently used data to servers. When fielding a request from an application, Memcache was returning more data than the application could use, chewing up network bandwidth. "Unnecessary data was being sent" because it had been configured to overdo the response side of its operation compared to the data coming in, said Chan. His staff was able to correct the problem before it led to any peak-traffic slowdowns.

Chan said Boundary was "easy to set up in a matter of hours." It places its own sensing agent on hardware devices, which automates the reporting of traffic to the central Boundary system. Any new agent reporting in prompts Boundary to add another device to its network topology map. The lines on the map illustrate which node is talking to which.

Chan has no complaints about Amazon as a cloud service provider and said CloudWatch served Wattpad's purpose initially. But now that Wattpad has reached 10 million visitors a month, he needs a more complete view of what's going on in his cloud infrastructure.

Asked where he'd be without his monitoring combination of New Relic, Datadog, and Boundary, he said, "We'd have to do some network sniffing on our own" to try to determine network traffic. "We wouldn't have much visibility without the network traffic element. With those three, they'll be able to carry us quite far."

Engage with Oracle president Mark Hurd, NFL CIO Michelle McKenna-Doyle, General Motors CIO Randy Mott, Box founder Aaron Levie, UPMC CIO Dan Drawbaugh, GE Power CIO Jim Fowler, and other leaders of the Digital Business movement at the InformationWeek Conference and Elite 100 Awards Ceremony, to be held in conjunction with Interop in Las Vegas, March 31 to April 1, 2014. See the full agenda here.

Charles Babcock is an editor-at-large for InformationWeek, having joined the publication in 2003. He is the former editor-in-chief of Digital News, former software editor of Computerworld and former technology editor of Interactive Week. He is a graduate of Syracuse ... View Full Bio

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Gaurav
50%
50%
Gaurav "GP"P799,
User Rank: Apprentice
3/16/2014 | 5:29:46 PM
Open technologies for Application Performance Management
Charles - thanks for your post on application performance management. This is a really important topic for users looking to ensure a reliable, secure and cost effective service. The ability to collect data in real-time and rapidly process and act on the data are critical for detecting errors, slow-downs and performance bottlenecks. In addition to the COTS solutions you have described, there is a rich and vibrant open source enabled eco-system that includes tools like Apache Flume, Logstash, Kibana and Elasticsearch amongst others. Some of the links below might be useful:

http://openopsiq.com/2014/02/09/application-performance-management-apm/

http://openopsiq.com/2014/02/09/centralized-and-structured-log-file-analysis-with-open-source-and-free-software-tools/

http://openopsiq.com/2014/01/20/real-time-event-capture-and-analytics-for-ops-insight-at-netflix/
cobiacomm01
50%
50%
cobiacomm01,
User Rank: Apprentice
2/24/2014 | 2:24:20 PM
Re: For a price, we're gaining visibility
Excellent post Charlie.   Visibility and manual optimization is the first piece of the puzzle. 
Michaelrj
50%
50%
Michaelrj,
User Rank: Apprentice
2/21/2014 | 11:53:03 AM
Elasticsearch
Wouldn't ES's new monitoring product Marvel, which only costs $500, have done the job as we'll?
Charlie Babcock
50%
50%
Charlie Babcock,
User Rank: Apprentice
2/20/2014 | 6:04:49 PM
You say you do everyything?
I don't know that Compuware gives you the insight into your enterprise Java application that an on-premises CA Wily APM does. There's Java app-knowledgeable diagnostics involved. I'm not aware of one product that can do everything.
ANON1242674878657
50%
50%
ANON1242674878657,
User Rank: Apprentice
2/20/2014 | 4:29:55 PM
How One Amazon Cloud Customer Gains Operations Visibilty
Wow, it's takes three separate products to give them visibility? Wouldn't it be great to have one APM solution that gives you visibility in one view and more? Compuware APM has that.
Charlie Babcock
50%
50%
Charlie Babcock,
User Rank: Apprentice
2/20/2014 | 4:16:21 PM
For a price, we're gaining visibility
New Relic, AppDynamics, ManageEngine and Riverbed OpNet are services that can shed light on application performance. Other external monitoring services, such as Compuware APM, are needed to illustrate how fast your Web applications are responding to users around world. For a price, we're gaining visibility. A typical Boundary customer pays $60,000 a year for its network traffic monitoring,
Hot Topics
14
3 Signs You're Overspending On Data Storage
John Morris, President and CEO, Cleversafe,  7/24/2014
4
Where Is Your Cloud?
Andrew Froehlich, President & Lead Network Architect, West Gate Networks,  7/23/2014
4
Network Security: An Oxymoron In The Cloud Era?
Rajat Bhargava, Co-Founder & CEO, JumpCloud,  7/22/2014
White Papers
Register for Network Computing Newsletters
Cartoon
Current Issue
2014 Private Cloud Survey
2014 Private Cloud Survey
Respondents are on a roll: 53% brought their private clouds from concept to production in less than one year, and 60% ­extend their clouds across multiple datacenters. But expertise is scarce, with 51% saying acquiring skilled employees is a roadblock.
Video
Slideshows
Twitter Feed