Network Management

The End All of Network Performance Management

We tested six end-to-end management products and evaluated their root-cause analysis, data collection and reporting features.

December 9, 2002

21 Min Read

The distinction between these performance products and fault- or exception-based network management products (aka managers of managers, reviewed in "Hot MoMs,") is blurring. Fault-based products combine performance data with the events and alerts that make up their fault data. They gather that data from many sources, whereas performance products gather events by polling a device, recognizing that the device is over threshold and then collecting the threshold violation event. Most performance products won't collect events or alarms from external sources.

Furthermore, fault products generally don't offer the perspective provided by high-level health views on application-, systems- and network-performance products, mainly because of limited performance data analysis. Performance products filter, abstract and condense underlying performance statistics, creating a synapse between business and IT. This is the point at which the collected performance data becomes useful information. The best products go beyond putting a number on overall health; they attempt to determine the root cause of a problem.

Monitoring the Managers

We tested Argent Software's The Argent Guardian 6.0a, Compuware Corp.'s Vantage 8, Concord Communications' eHealth Suite 5.0.2, NetScout Systems' nGenius Performance Manager 1.4, NetQoS' SuperAgent 3.0 and ProactiveNet 4.1.2 from ProactiveNet. We rated these packages' reporting, root-cause analysis, data collection, implementation, administration, pricing and "gotchas."

All the products include good canned reports, flexible report writers and browser-delivered formats. But each product partitions reports and delegates control over content and format differently. One key capability that separated the winners from the losers in our tests was root-cause analysis. Our Editor's Choice, ProactiveNet, used this functionality to get to the heart of our IT problems.As for implementation and administration, none of the products will instantly tell you what's going on with your network. We had to spend a lot of time determining what to collect and how to group data.

To get pricing, we gave each vendor a Visio diagram of a network, complete with details about consoles, servers, routers, switches and interfaces, and asked for an estimate.

Finally, though no software is bug-free, we couldn't ignore even occasional screw-ups. Simply put, we dinged the products that dinged us.

ProactiveNet is a solid performer, is easy to set up and administer and has the best root-cause analysis features. Concord's eHealth Suite is a close second, with superior report delivery. NetScout also has a fantastic report interface but did not provide any of its agent technology for our tests. Instead, NetScout relied on its very strong probe data collection. However, this product has the steepest price--35 percent higher than the next highest competitor. Compuware's Vantage 8.0 has the widest data collection and analysis, but its many modules are loosely integrated, increasing administrative effort and error. The Argent Guardian focuses on systems management, but includes SNMP monitoring of network devices at no additional cost. Unfortunately, the SNMP reports require hefty setup work to be useful. NetQoS's SuperAgent appliance is easy to implement. It did well in our tests but did not give us as deep a look at the client side of transactions, and it supports only TCP. Hot dogs are tasty, as long as you don't know what's in them. But when a product like ProactiveNet starts hot dogging--saying it can determine the root cause of a problem, yammering about secret ingredients like "intelligent thresholds" and "smart filters," and promising you don't have to set up a bunch of performance thresholds--you have to see it with your own eyes. We looked, and we'll still bite.

To gather data, ProactiveNet uses Web transactions and a polling engine the vendor calls an agent, though it's actually a server-side process. The polling engine collects, analyzes and reports on a wide array of performance metrics over time.To some degree, all the products do this. But ProactiveNet automatically creates performance thresholds and correlates devices into groups. As a device is added to ProactiveNet, it's categorized as an application server, Web server, mail server, router or switch, and then monitored based on further definitions, that could, for example, include MIB II, Host MIB, Enterprise MIB or Oracle SQL query data. The list of monitors is extensive. These thresholds determine what is normal or average over time and can be changed, though ProactiveNet recommends waiting until you've collected data for a few months and know how the thresholds are affected. We left them as is. The software also filters out unrelated data. Using the groupings and time-based correlation, ProactiveNet lists root-cause suggestions in order of decreasing likelihood.

ProactiveNet's second, separate set of static thresholds generate user-defined alarms. All the products we tested can set these types of alarms. Crossing a static threshold's value creates an event (such as "Threshold has been violated") and an alarm. We easily changed this product's canned thresholds to fit our testing environment.

We collected Web transaction data and ran it across a simulated WAN connection while we periodically overloaded a Web server. ProactiveNet accurately pointed to the slowdown on the server as the most likely problem, and then pointed to the increased utilization on a shared switch prior to the server. Without an agent on the server, ProactiveNet incorrectly identified the switch link's utilization as the problem. If you can't implement the server-side agents, be skeptical of the root-cause diagnosis.

At the top level, ProactiveNet reports on alarms and performance health. The alarm display clearly indicates problems without overwhelming users with excess data. The initial page shows a matrix of devices and monitored services. A green dot indicates the device is fine with the service listed; if something is amiss, a yellow or red icon appears and links to specifics, such as graphs and checks.

End-to-End Management Pricing
click to enlarge

ProactiveNet lacks flexible client-side scripting, offering just two flavors. Both support Web transactions and network service checking of UDP and TCP ports. The first version, a standalone agent, runs the port checks and transactions and reports to the ProactiveNet server. The second acts as a proxy, gathering statistics like a midlevel distributed manager, focusing the data collection and reducing the upstream reporting to the ProactiveNet server. Ideal for gathering statistics on the other side of a constricted WAN circuit, this proxy function decreases the ProactiveNet server's traffic and reduces the bandwidth needed to support several distributed agents.ProactiveNet's administrative console let you create simple Web page downloads or in-depth, multipage transactions. Client 32 and other non-Web transactions are not supported. The process records key strokes and screens to a script. Once the agents are installed on remote machines, you can add, delete and manage the transactions from a single interface. ProactiveNet's process is much simpler than the hoops Compuware Vantage's agents make you jump through.

ProactiveNet charges for training, professional services and 24/7 support; however, the product is easy enough to learn without training, and we found the phone support during business hours more than adequate.

ProactiveNet 4.1.2, starts at $12,500, ProactiveNet, (877) 277-6686, (408) 935-6800. www.proactivenet.comLong the gold standard, eHealth Suite is still one of the best performance management solutions on the market. The number of data sources from which eHealth Suite can draw is huge: network devices, servers, applications, transactions, probes and circuits. The biggest difference between eHealth Suite and ProactiveNet is not the price--though eHealth Suite is more expensive in our pricing scenario--but the extensive implementation and administration eHealth Suite entails. For example, eHealth's root-cause tools require thresholds be set manually, based on the user's data transmissions. Also, we had to apply our network-topology knowledge to determine what devices in the network path could be causing a network slowdown.

Concord pioneered assessing network-performance specifics, and eHealth's reporting is unmatched. The product's Health Reports show more than a happy or sad face for each monitored element. Each report includes various health indices that combine characteristics for each device type. A router health index, for example, includes buffer misses, buffer utilization, CPU and faults. The LAN interface index includes errors, discards and collisions. The higher an index's value, the closer it is to a threshold.

Like ProactiveNet, eHealth Suite requires that you learn how changing thresholds will affect events and reports. eHealth Suite clearly documents what the various thresholds are measuring. All eHealth's thresholds come preset and do not change based on normal usage, as ProactiveNet's intelligent thresholds did.Concord, along with Compuware, has welded together various products to create very wide selections from which data can be gathered. The company has been at this game so long, it has probably encountered everything you need to have monitored. As such, eHealth Suite gathers data from probes, switches, routers, remote-access devices, frame relay circuits and many more devices. eHealth Suite is better integrated than Compuware's Vantage Suite, so it's easier to deploy server-side and client-side agents. The suite's AdvantagEdge server-monitoring module is a separate product, but the Web-based interface is integrated into eHealth's reporting portal.

The AdvantagEdge module includes the server-side SystemEdge agents, which gather performance metrics via SNMP and rely on both host MIBs and Concord's own extensions. SystemEdge actively checks for services on both local and remote servers and alerts a central server, but even with the Host MIB responding to an SNMP walk, the eHealth Suite server didn't recognize some of our servers under test. Concord had not seen this problem before.

We used the SystemEdge agents to monitor HTTP, POP3, SMTP and other network services. Like ProactiveNet, the agent's performance metrics include name resolution, connection time and total transaction time; the agents also monitored SNMP OIDs (object identifiers), processes, CPU and memory. Crossing a threshold set off an alarm.

eHealth Suite also leverages the Cisco Service Assurance Agent and Cisco NetFlow monitor, which uses Cisco routers to run remote checks for latency and availability. It may not be wise to add processes to a busy router, but if one's hanging around the edge of your network, you can use it get some easy, accurate performance information on network services.

eHealth Suite remains a very solid performer and a good enterprise choice for tracking end-to-end network application performance.eHealth Suite 5.0.2, starts at $15,000 (typical licenses range from $100,000 to $150,000, depending on infrastructure size), Concord Communications, (888) 832-4340, (508) 460-4646. www.concord.com Compuware Vantage 8 | Argent Software The Argent Guardian 6.0a | NetScout Systems nGenius Performance Manager 1.4 | NetQoS SuperAgent 3.0

Compuware Vantage 8

Compuware has pulled together its varied network and systems management tools and renamed the entire package Vantage. Although it provides all the information an administrator could want, Vantage's fractured nature makes the product harder to use than the competition.

Vantage comprises seven pieces: ClientVantage, ServerVantage, NetworkVantage, Application Expert, Application Vantage, Predictor and VantageView. The last is the product's glue, a well-designed browser-based performance reporting interface that holds the other pieces together. Reports, listed as "channels," include "My Reports," a self-configurable view. We spent a lot of time configuring VantageView and got a lot of performance data in return.

Vantage serves up end-to-end performance data very well. By combining its systems, network and client data collections, this product can correlate precisely how long a transaction takes to complete, and where any delays occur. When a transaction is fired off, for example, a client-side agent watches threads, CPU utilization and other elements of the client environment. A probe collects the data in transit and provides a packet trace of the transaction. Finally, VantageView displays these performance metrics together.

VantageView's canned client, network and server reports offer high-level views and thoroughly link to the lower-level data. Client-side reports, for example, show trended availability and latency, and link to the underlying transaction detail, which links to the specific thread analysis for each transaction.

Network reports include daily traffic totals with overlays of transaction data that show top protocols, servers and workstations by volume. You can compare one day's traffic to another by flipping back and forth between days (a side-by-side chart display would have made this comparison better). Canned server reports display server health via CPU and memory statistics.VantageView includes a new component, Visualizer, that lets you graphically map live data to a bitmap. Compuware smartly lets you use Visio diagrams with this feature. You have to buy Visio, but you can view those diagrams, with status, within a browser. Sweet!

Among Vantage's other components, ClientVantage is notable for its extremely clear service-level reporting. ClientVantage reports look at transaction response time in unbelievable detail. For example, starting with a month's summary of data, a response time can be drilled into, showing the day in hour increments. You can drill even further into specific client-resource environments to determine the point at which the SLA (service-level agreement) was violated. No other product in the test had this level of detail.

But to determine what's normal and what's an exception, you must know where to look for the current data then create a report that compares it to past traffic.

Administering groups of ClientVantage, ServerVantage and Application Expert agents is infinitely flexible; Vantage can conform to any departmental or user grouping needed. This allows performance and fault agent functions to be configured and arranged to provide differing alerts, transactions and performance metrics.

On the other hand, creating robotic transactions using ClientVantage and Application Expert is likely to bloody your knuckles. Compuware's authoring tool, QARun, must run on a separate box, and, while it supports any type of transaction, you have to figure out how it works first. Not a good use of time. Also, managing the robotic scripts is not seamless enough for the state of this technology. For example, QARun creates transaction scripts in an Access database by default. Though the product supports MS SQL and Oracle, distributing the database (as our test required) requires considerably more work on either platform. ProactiveNet and Concord's eHealth Suite have more centralized, integrated support.To define client-side transactions to the ClientVantage server, you must export the script manually to a mapped drive on the server, then collect data or insert checkpoints. That's just too clunky and error-prone for mass deployment.

Each Vantage component includes two to five days of on-site "implementation assurance" to ensure quick deployment. This service, performed by skilled Compuware consultants, consists of installation verification, configuration, initial deployment and training. However, even with all the support, Vantage is better suited to application-development testing than to alleviating network managers' real-world hassles.

Vantage 8, starts at $19,000, Compuware Corp., (800) COMPUWARE, (248) 737-7300. www.compuware.com

Argent Software The Argent Guardian 6.0a

As a systems management application that doesn't need agents on servers to gather data, The Argent Guardian is different from the other products. It has strong rule-based exception management and can monitor Microsoft Windows NT and Unix servers without an agent footprint. Because the product includes polling of SNMP MIBs without additional cost, it's also a remarkable value: tens of thousands of dollars less than NetQoS, the nearest competitor.

Pricing Scenario
click to enlarge

This product collects data via API calls on NT and via SSH or telnet on Unix--a handy feature because you don't have to manage agent distribution. But you still will need to beg for access if you don't control the servers because the process logs into each server when gathering the performance metrics.

This package comes with a huge selection of predefined data collection rules, grouped into categories such as Active Directory, Event Log, Performance SNMP, Log and SLA. Within these groups, there are subgroupings. For example, the SNMP category features rules that apply to Cisco, Compaq, Hewlett-Packard, Linux and many other platforms and devices. It's impressive to see 20 predefined Cisco rules that cover CPU, memory, Layer 2, Layer 3 and other performance metrics. An SLA rule measures availability; Argent says this is better than a ping because a busy system might answer a ping but be running applications that are too busy to respond. Good point.

The Argent Guardian also has a huge set of alarms, which can notify you via all the usual methods, such as beeper, pager or SNMP. Some unique inclusions are based on application and database metrics, such as SQL, Exchange, start and stop services and even system shutdowns. There's nothing like having power!

Argent Predictor, the product's trending engine, requires a defined set of data to collect. There are canned reports, but we wanted to trend our infrastructure and Web server to get a correlated view, as with all the other products, to avoid creating empty reports when a monitored device doesn't support the metric specified in the data-collection scenario.

There is a basic network report that gathers total bytes per interface and has a handy integrated MIB browser to allow for quick point-and-click selection of a specific SNMP OID to be monitored, but a shortage of some details limited the data's usefulness to hard core network management types.Further, network devices cannot have their baseline created unless licensed. So, while we get real-time SNMP checks of network devices free, trending data costs extra.

Argent administration uses Microsoft MMC consoles and a giant Web publishing download called UBI (Universal Browser Interface). In 10 minutes, I had the MMC reports available, read-only, to my local neighborhood browser. It works, and it's easy. The product doesn't allow for portioning of data by user or application group, the way NetScout, Concord and Compuware's products do. Also, authentication relies on Windows 2000 or NT and at first denied us access to machines on untrusted domains. To avoid this problem, we created a user on the local domain that matched the user on the Argent server's domain.

Argent uses calendars. A main calendar defines day-to-day periods, such as work hours, after-hours, weekends, and holidays, but also lets you define and monitor special periods, such as the holiday shopping season. Simply define that season, then deploy the calendar. This sophisticated function is common on mainframe-scheduling applications, but rarely seen in client-server technology.

The product also supports regional agents--essentially, remote engines that check the health of the main monitoring engine and do some of the polling. The vendor recommends one regional agent per 100 devices. Whereas the other vendors in this test charge for additional distributed polling engines, Argent charges only for an additional license; the distributed functionality is included.

Argent's pricing was the lowest of all the products tested, in part due to the free SNMP polling. However, trending SNMP devices requires a license, and trending is necessary to determine what is normal and what is an exception. Still, Argent indicated that its $58,700 price for our scenario was accurate, regardless of the licenses required to capture historical SNMP data. As always with pricing, your mileage will vary, but this seems like the deal of the year.The Argent Guardian 6.0a, $15,000, Argent Software, (860) 674-1700. www.argent.com

NetScout Systems nGenius Performance Manager 1.4

Low-level probe data and its high cost work against NetScout, but nGenius does have some benefits. The product has a superior reporting paradigm, and implementing its proprietary data sources doesn't require any cycles or installation from existing production servers, switches or routers. That's worth something.

And NetScout has high-level performance views covered. A browser-launched Java applet reports performance in an attractively designed newspaper paradigm. Reports are published as articles, and users are subscribed to newsstands that partition data into information. This eye candy does a good job of covering everyone from executives to operations staff to engineers with very little work.

The nGenius probes use a nice trick to track network services, such as DNS and SMTP. A virtual interface is defined, and well-known TCP and UDP ports are statically assigned. Then, when traffic matches those ports, it's mapped to a particular transaction. This release of nGenius cannot use dynamic ports, but NetScout says a future release of nGenius will be able to do so.There is nothing wrong with nGenius' CLI (command-line interface) for configuring the probes, but we expected those data-collection instances to be reflected in the management application. As it is, each probe must be checked individually for the proper configuration.

We also learned that the Java engine doesn't yet support Windows XP, as none of the three versions of the Java Virtual Machine launched from within Microsoft Internet Explorer 6.0.2 or Netscape 7.0 would run. NetScout says XP is still to be certified as a base client OS.

nGenius Performance Manager 1.4, $50,000, NetScout Systems, (800) 357-7666, (978) 614-4000. www.netscout.com

NetQoS SuperAgent 3.0

It's all about focus for NetQoS's SuperAgent: This package requires you to focus on what you are going to monitor. SuperAgent also only monitors TCP applications, admittedly the primary application protocol, but not the only, so the focus may be too tight for some.Out of the box, NetQos SuperAgent is a blank slate--a 1U Dell appliance whose installation amounts to configuring a IP address in Windows 2000 and determining where on the network to monitor. SuperAgent's high level comes from what you tell it to monitor. The biggest job is figuring out what traffic to capture.

This is in contrast to the other probe-like products that gather data off the wire and then try to make sense of it by categorizing the traffic. For example, once data sources were defined, the other products we tested began some default reporting that we could adjust. The SuperAgent approach requires more implementation work, but also reduces the noise level of unwanted statistics.

SuperAgent helps with configuration by listing a huge number of existing well-known and registered applications, monitoring any user-definable port and displaying TCP applications that it has seen for inclusion. This last point is useful, as it reduces the amount of work to define TCP ports that are to be reported on.

SuperAgent's Alarm display is nicely compact, simply divided into three categories: standard, custom and additional alarms. The standard alarms measure network round-trip time, throughput and loss rate. All are displayed in eight-hour, daily, weekly and monthly buckets. Each one is linked to a summary display for the time period sorted by type and those to the specific error conditions creating the alarm. It makes for a very spiffy display. The 11 custom alarms cover connection time, refused sessions, open sessions and timed-out sessions, among others. You create the additional alarms. For example, during implementation, we created packet-focused tracking fragments and discards. Links drilled down to reveal more detail and the specific cause of the alarm.

We upgraded SuperAgent during the test--an act that required a console connection to download the files and run the setup. Had we not been on the console of the machine, we would not have been able to complete the upgrade, as the remote control hung at the end of the process. NetQoS's upgrade read-me document says a reboot may be necessary if the appliance hangs, so you're out of luck if this happens when you're across town. After the reboot, everything worked fine.SuperAgent 3.0, $29,500, NetQoS, (877) 835-9575, (512) 407-9443. www.netqos.com

Bruce Boardman is executive editor of Network Computing, testing and writing about network management and systems. He has 12 years' IT experience managing networks and distributed computing for a financial service provider. Write to him at Bruce Boardman at [email protected].Trying to get a handle on your entire business takes easily understandable information about the network, systems and applications you operate. Our review of end-to-end performance management tools from Argent Software, Compuware, Concord Communications, NetQoS, NetScout and ProactiveNet focuses on the packages' usefulness in painting a clear picture of your IT systems' overall health. We considered their reporting abilities, root-cause analysis abilities, data collection, implementation, administration, pricing and "gotchas."

We awarded ProactiveNet's eponymous performance management suite Editor's Choice, primarily for its outstanding root-cause analysis features. More thoroughly than the competition, this product uses automatically created performance thresholds, logical groupings and its top-notch filtering ability to list the most likely sources of network trouble.We ran a Web server with an ASP application and pointed two communities of users at it. The first group used Mercury Interactive LoadRunner 7.51, letting us control the amount of load on the server. We varied the number from a few concurrent sessions to more than 100--more than enough to bring our test ASP application to a complete standstill and peg the Web servers' memory and CPU.

The second community was much smaller--just a few machines, routed through The Cloud 2.1, a WAN simulator from Shunra Software. In this way we could control the remote clients' throughput and response times. These clients executed robotic and actual transactions, which all the products monitor.

Both sets of user communities were routed to a shared switch, upon which a span port was created. We used a multiport NetOptics tap to connect all the probes to the span port.

R E V I E W

End-to-End Management

Sorry,
your browser
is not Java
enabled

Welcome to

NETWORK COMPUTING's Interactive Report Card, v2. To launch it, click on the Interactive Report Card ® icon

above. The program components take a few moments to load.

Once launched, enter your own product feature weights and click the Recalc button. The Interactive Report Card ® will re-sort (and re-grade!) the products based on the new category weights you entered.

Click here for more information about our Interactive Report Card ®.