W O R K S H O P

Finding the Best Approach to App Monitoring

July 12, 1999
By Bruce Boardman

The holy grail of network and systems management is the ability to predict failures before they hit. Although vendors have a spotty record delivering on promises to foretell system failure, they have made some progress.

Users typically receive notice of network failure via their applications. Monitoring applications thus makes sense; what better place to look for problems than where they rear their ugly heads? Although this isn't a new concept--response-time monitors for mainframes are well-established--the number of application-monitoring products has soared in the past year or so, in part due to the acceptance of Windows NT and the reality of IP as a protocol of choice.

Application-monitoring products create a baseline that represents normal application behavior, polling or watching key elements, such as an ERP (Enterprise Resource Planning) application or Web and database servers. Some products also enable service-level settings and compare a range of desktop, network and server performance metrics in real time or according to a set schedule.

There are three basic monitoring locations: the desktop, the network and the application server. Networks have been monitored at the network and the server using SNMP and agents. Lightweight agents that sit on users' machines or distributed servers are the newest addition. These distributed agents classify the application monitoring as active (create synthetic transactions that are sent through the network) or passive (watch what is happening at the desktop).

There are differing opinions about where and how to monitor for best effect. But by and large these approaches are far from perfect. We'll flesh out the details of the passive and active approaches and discuss the pros and cons of each method here.

The first--and perhaps most mature--approach to application monitoring is server-based. Metrics are gathered from a server and the client experience, and then projected based on server performance. This approach fails to take the network or client desktop environment into account. But it has the benefit of centralizing measurement at the application server, a key point in the transaction that is highly susceptible to performance variations.

Monitoring applications at the server offers some of the most detailed, application-specific information for diagnosis, tuning and prediction. Envive Corp., Oracle Corp. and PeopleSoft applications leverage transaction specifics, as well as the underlying database and back-end processing, enabling detailed tuning.

A little less difficult--and less granular in terms of application monitoring--are the more traditional OS performance products: BMC Software's Patrol, Computer Associates International's Performance Monitor and Hewlett-Packard Co.'s PerfView. While not focused specifically on the application, all three register OS metrics about memory usage, cache hits, and disk and network I/O, among other measurements. Their role in application monitoring takes off from a valid assumption: If the operating environment is functioning well, then so are the applications. One benefit is that these products are easier to deploy and understand than the specialized server approaches.

Both the server- and OS-based methods keep the network and client constant, so they can project how well the application is performing (however, this projection often is not truly accurate). Monitoring at the server is complex; a number of intervening servers impact upon a transaction, as does the network infrastructure.

Probing for Performance
Monitoring transactions at the network is primarily accomplished by probes that examine traffic on the wire and predict application performance based on their analysis. Besides not deploying to server and client, this passive approach has the added value of correlating network and application performance.

To cover the entire network, a probe is required on every segment--a headache for switched networks with numerous segments. But you can bypass this limitation somewhat by strategically placing probes on the network backbone, server farm segments and other critical, aggregating segments. This doesn't extend monitoring to the client, but provides a central monitoring point for all important transactions.

Capturing and decoding tons of packets--into possibly thousands of sessions--is very involved. In our experience, heavily loaded probes keep this from being a viable approach; parsing and reporting on an overwhelming amount of data quickly buries any hope of receiving a timely report.

We placed an Apptitude probe on a 100-Mbps segment feeding a T3 Internet connection. It kept up with the network traffic and recorded an average of 200,000+ sessions an hour. But when processing a session, the centralized analysis application only crunched about 140,000 sessions per hour--a deficit that increased with time. Adding hardware on the server platform wasn't the answer; the analysis application wasn't taxing the hardware or OS.

This wasn't an isolated incident; we tested three probe applications and all had difficulty crunching the amount of data created by a busy segment or network. The solution is to filter the captured and analyzed data, which will reduce the computational load; just be sure you carefully select what to watch and what to toss.

If you're not absolutely certain which method to choose, consider deploying a lightweight agent on your remote desktop. From the client's point of view, application performance is acceptable only if all the intervening infrastructure, network, servers and desktop applications are working correctly. These lightweight agents measure the end result of a transaction and/or an application's availability, and then report the results to a centralized server. The results are then combined and analyzed for baselines and trends.

Supporters of the passive approach accurately point out that the results are from the client's perspective; the mix of transactions, time of day, user think time and complexity of measured results are real--not IT department predictions or guesses. Another benefit is that passive technology can look at any transaction type, regardless of complexity, whereas synthetic transactions must know the specifics of each transaction they will be monitoring.

One problem with this approach is that the client is the only constant. The number of packets per transaction will vary, so you will not have an exact measurement for network performance. But this doesn't mean you can't measure network latency. Vital Signs' VitalSuite times the SYN packets at session initialization and then uses that differential as a constant assumption for network time in overall transaction time.

On the flip side are products that use synthetic transactions, which are either canned or user-modifiable running from a distributed agent on a predetermined schedule. The transaction can be as generic as pings, DNS lookups or SMTP checks, or as specific as SQL statements or user-definable transactions. The transaction is consistent, so deviation from the baseline warns that performance is degrading. Unlike the passive method, synthetic transaction-based applications can be checked without clients being on the network. This is a significant advantage--especially when the alternative is waiting until Monday morning.

Another modification of the synthetic transaction approach involves maintaining both the client transaction and the application server. Ganymede Software's Pegasus follows this approach by taking the form of a less realistic transaction that is designed to represent one of the generic transactions. The advantage is that network performance can be singled out--or ruled out--as a failing component.

Unfortunately, with the synthetic approach, active transactions place additional load on the available bandwidth. This isn't an issue with LANs, but it's cause for concern on WAN links. If a single device at each remote office is sending synthetic transactions to a central site over a WAN link, then hey, no big whoop.

The best approach depends on the mix of applications to be monitored and what this monitoring needs to accomplish. If helpdesk client support is important and transactions are highly complex, a passive client approach is optimal. If network performance is important, then a synthetic transaction between two agents is going to be the most predictable.

Network Diagnosis
The ability to correlate the client transactional experience with network and server performance metrics is critical. FirstSense Software claims to be able to associate SNMP-gathered network infrastructure and RMON statistics with synthetic transactional baselines.

This type of correlation takes a somewhat shallow form, with a consistent date stamp across batched reports or a single console that displays network and transactional performance metrics. A better approach is to provide baseline network and application transactions reports that are associated with exception reports. Of course, this presupposes that the application monitoring function understands the network topology sufficiently so that it correlates a failing piece of infrastructure as residing within the transactional path. Proactive Networks' Pronto Watch claims this high-level functionality, which we will put to the test; look for the results of our application monitoring tests in our August 9 issue.

Send your comments on this article to Bruce Boardman at bboardman@nwc.com.



PAGE: 1 I 2 I NEXT PAGE
 

Research and Reports

Storage Virtualization Guide
May 2012

Network Computing: May 2012

TechWeb Careers