We tested five packages that provide fault and performance control over 1,000 nodes for $10,000 or less.

August 19, 2004

We invited BMC Software, Castle Rock Computing, Computer Associates International, Hewlett-Packard, IBM, Ipswitch, Linmore, NuLink, RedPoint, SolarWinds.Net, Visualware and Wild Packets to participate in our tests. NuLink never responded; products from BMC, IBM, Linmore and RedPoint couldn't squeeze under the $10,000 ceiling; and Visualware's and Wild Packets' offerings were not ready in time for testing.

That left Castle Rock, CA, HP, Ipswitch and SolarWinds.Net. We gathered the software in our Syracuse University Real-World Labs®, took over the network and put these products to work.

Foundation for Growth

Our testing didn't require vendors to scale or support third-party databases or distributed processing, functions that would be de rigueur for high-end network management. However, most of the products offer some features that would let them grow with your network.

We limited discovery to a set of important nodes--primarily routers and switches--though we did include servers and a few clients as part of a test application (a multitiered Web app) we monitored. All the products easily kept tabs on these 1,000 interfaces--in some cases they began by discovering more than we wanted, sometimes even when we didn't ask them to. In addition, all the entries except Ipswitch's WhatsUp Gold have distributed-polling capabilities that let them poll a plethora of devices while going easy on WAN bandwidth.HP and CA both support IP, IPX, DMI and Layer 2 discoveries. We didn't evaluate IPX discovery, though we didn't turn it off and did see IPX results. Both products support the combination of IP and IPX discoveries, with a preference for IP when both are available on a network.

We didn't take into consideration high-availability or scalability functions in the architecture of the products we tested, as our focus was on managing the 1,000 interfaces. Scalability, while important (and available with some of the products), isn't as important as getting an immediate network-management bang. Our $10,000 limit also put high availability on the back burner. Still, it's easy to see how having redundant servers, distributed pollers, support for external third-party databases and the ability to administer parts of the system without rebooting would be valuable as your network grows.

Orion's Lucky Star

SolarWinds.Net's Orion Network Performance Monitor is our Editor's Choice. The software quickly got down to managing our network with good out-of-the-box functionality. It had enough depth and flexibility to give us a good handle on how the network was performing, without being overly complicated. We aren't exactly jumping up and down about it, but given the modest dollars and time Orion required, the product is a pretty sure bet.

Daily Average Cost Calculatorclick to enlarge

Our Best Value award goes to WhatsUp Gold from Ipswitch, which also has solid features and administration that will reap immediate rewards. Costing less than my kids' daily lunch money (a measly $1.19 per day over five years), it's not going to grow with you, but it won't let you down either. SNMPc from Castle Rock had the second best price and great SNMP MIB support without being overly difficult to administer. Though it did go a little nuts when discovering the network and had just average fault and performance management.HP's OpenView Network Node Manager and Computer Associates' Unicenter Performance Management and Unicenter Advanced Network Operations were the priciest products we tested (OpenView NNM just made our limit at an even $10,000). Both had strong fault and performance features but also were the hardest to learn to use and took the most work to maintain. However, we were amazed at what each can do for the dollars. If yours is a small network now but you want something that's going to grow with your organization, both of these, along with Orion, are worth considering.We took an immediate shine to Orion. Even though it didn't have an overwhelming feature set, what it has works well, without a hassle. We like it so much we'd recommend it to our favorite aunt. Heck, even to our mom.

Orion exposes everything it monitors, making understandable information out of raw data. All the products we tested gather SNMP performance stats and display faults based on SNMP traps and the polling of performance data, but Orion's uncluttered and flexible display shows network status, diagnostic direction and trends clearly and without requiring customization. For example, it was the only product to report on CPU, memory and disk usage of discovered Microsoft Windows and Sun servers without us having to jump through SNMP hoops.

SolarWinds has what it refers to as a "95th percentile calculation," meaning that when you're looking at a graph on which the value is set at 95 percent, you'll see the number of data points that fall below that line. This set point is movable to any percentage value; for example, if the 95 percent is changed to 70 percent, then the data points below 70 percent would be visible. This is a useful tool to help identify aberrant threshold violations or determine general usage. For example, if an interface has one big peak that causes the average utilization to rise above the threshold, but 95 percent of traffic is below the threshold, the problem may be transient.

Out of the box, Orion offers a dozen default thresholds, and new ones are easily created. Each can be set with variable percentages or static values for both the threshold and reset values. Values are set to each interface by default, but each interface can be removed if need be.

Default alerts are also easily modified from a static number to a percentage. The list of metrics being monitored are grouped at the highest level by network nodes, interface and volume. Each group has subgroups, such as status, polling, errors, statistics and traffic. These groups are populated with more than 100 individual metrics, including "last boot for network nodes," "interface errors received today" and "volume percentage used." The layout keeps what can be an overwhelming number of knobs accessible and organized.Fault management is one of Orion's weakest areas, but it does display events and alarms clearly, and you'll get value without heavy lifting. However, you won't get advanced fault features, such as the support for traps or log files found in CA's and HP's products; and there are no canned correlations, like those found in HP's OpenView NNM, to suppress downstream events or deduplicate recurring events.

The event console can be displayed as a summary or detailed list, and events can be linked to the devices to which they apply. The summary display groups similar events, and a single click will clear an entire group--a useful feature after a losing a router, as it makes getting rid of all the interface alerts fast and easy. Because the interface is completely configurable, you can move these lists and summaries into any configuration.

Getting rid of all the node's up or down events made it possible to quickly set things back to zero, but that's also a dangerous command: It could make shift turnover easy while hiding the true status of the network. Of course, if shift turnover means you leaving and coming back, then no big deal.

Vendors at a Glanceclick to enlarge

When an alert is displayed, all of the relevant interface utilization and error statistics are, by default, displayed as well. Settings can be customized with a simple point-and-click from within the Web interface as long as you're logged on as administrator. Events can't be assigned, acknowledged or cleared by operators, though they can be cleared from the Win32 admin GUI. So, the software doesn't provide a workflow process that would let the Web interface be used by operators; this is limiting if you've got to roll the management product out to a number of folks.

The process of creating an alert is as simple as checking the appropriate metric and setting the threshold value. By default, all alerts are applied to all appropriate devices. The devices, as we expected, can display status, but what was nice is that the status can be displayed in different ways; that's in keeping with the flexibility we found throughout the Network Performance Monitor.For example, a display titled "overview" showed two groups of icons--one for each node and one for the interfaces. The icons actively display the status of each interface, which is a useful, quick update. We've found this feature in many network management applications, including Ipswitch's WhatsUp Gold. But what is unique to Orion is that the node and interface stats can be tied to different metrics, including percent utilization, errors, discards today or in the last hour, current signal-to-noise ratio, averages of collected data, and bits transferred on the interface. You can set these values or allow users to change them as needed.

Orion can't parse event streams or correlate events, as HP's and CA's products do, but a handy set of macros is provided wherein you can broadly categorize such variables as node name, time, interface, address, SNMP community, status, response time, speed and errors, and insert them into event text.

Take note that the paint is still wet on this utility. There's no Layer 2 discovery, no discovery or monitoring of trunked links, and no port scans of TCP/UDP services. But what's there works.

Discovery is ICMP- and SNMP-limited by subnet, or seed router. We like a "don't discover" or "not" list, so discovery can be limited to specific interfaces without looking at an entire ARP cache on a gateway router, but Orion does not offer this option. When you hit a heavily populated cache, all the devices get checked. If you're anything like us, you'll want to limit what you manage; in that case, Orion's discovery will be too much and you'll just add the devices manually. At least we felt that was easier than deleting devices one at a time.

During our testing SolarWinds sent us its new mapping tool. It was buggy initially but eventually provided basic bitmaps with links to underlying status.The canned reports offer the works, including top-10 lists, tabular reports and graphs, all nicely linked to underlying data. Every "report resource" in the out-of-the-box reports is up for grabs, making it easy to combine gathered statistic into whatever look works for you.

The 18 canned reports run the gamut from response time, utilization, peak loads and CPU utilization to down interfaces and top disk space utilization. SolarWinds said it will release an "ad hoc report writer" in "early 2003."

We could modify reports by, for example, replacing charts with much sexier gauges and inserting stats like server CPU load, memory utilization and hard-disk usage. In addition, every item displayed within the Web GUI could be moved to any other Web display simply by copying the source. This meant if we had a performance gauge for an important router that we wanted to give to a network operator, it was a simple cut-and-paste.

Orion Network Performance Monitor 6.0, SolarWinds.Net, (918) 307-8100. www.solarwinds.netIpswitch WhatsUp Gold 7.0 | Castle Rock Computing SNMPc 5.1.6c Castle | Computer Associates Unicenter Performance Management and Unicenter Advanced Network Operations | HP OpenView Network Node Manager

Ipswitch WhatsUp Gold 7.0

WhatsUp Gold from Ipswitch is just what the doctor ordered for the budget-minded network administrator. It's simple enough that you can get it up and running within a day, and at the bargain price of $795 WhatsUp includes a Web server and Web interface for basic network management.

The device discovery wizard walks you through each step in creating a network map, and the autodiscovery process was the easiest to set up among all products tested. WhatsUp Gold can discover a network through SNMP, ICMP or devices in Network Neighborhood, or we could import devices from the registry or a host file.After initial discovery, WhatsUp lists the devices found and prompts you to choose which items to include in your map. During our test, WhatsUp found most of our interfaces in less than a minute from an imported host file, but it failed to create connectivity relationships to other devices in the topology map. Of the other products we tested, all except Orion were able to perform Layer 3 mapping correctly.

Connectivity relationships were corrected easily by manually connecting the devices. We attempted autodiscovery through ICMP, SNMP and host file scans and got the same results each time. WhatsUp failed to discover a few servers in each scan, but they were easily added in the Map Edit menu.

WhatsUp comes able to monitor 15 predefined services, such as HTTP and FTP, and is fully customizable. However, it didn't consistently discover all services running, requiring a rediscovery of individual services within some devices.

WhatsUp's rediscovery and polling options are quick and easy to change. Device status of mapped interfaces is displayed through a combination of colors and shapes, all of which are configurable. Polling frequencies and thresholds are easily set with a few mouse clicks.

WhatsUp's plethora of event alert mechanisms are easy to set up and assign, and alerts can be sent via console, e-mail, beeper, pager, windows pop-up, executable or text-to-speech. However, the event log is very basic compared with those of rivals, and the information display is limited; events are more difficult to read than they are in Castle Rock's SNMPc and HP's OpenView NNM, which display events by color and categories. Searching WhatsUp's event log is the same as searching for text in Notepad, and the only filtering option is by time.WhatsUp generates performance-, event- or statistics-based reports that are easy to read and create.

The canned performance reports can be generated for hourly, weekly or monthly spans. Information reported includes availability and response times for devices, interface outages and downtime. Reports include detailed daily performance statistics on each device mapped within WhatsUp and are very thorough, citing stats most commonly used by systems administrators, including device utilization and response times. Administrators can also generate reports from the event log through the command line. Customized reports are available using a Crystal Reports plug-in.

WhatsUp Gold ships with a proprietary Web server for deploying network maps so network administrators can monitor their systems remotely. We set up the Web server and assigned users in a matter of minutes. For additional security, you can set the IP security function to grant access only to those in a certain subnet.

From inside the Web display administrators can add and remove hosts, change settings, and view event and statistical reports. Selecting the details of a device gives you extensive information about individual interfaces, such as response time, poll statistics and downtime. The Web interface's functionality is on par with SNMPc's Remote Console and OpenView's Launcher application but is not as flexible or easy to customize as Orion's.

Also included are basic networking tools, such as trace route, ping, finger, lookup, scan IP and throughput. The SNMP viewer, SNMP graph and MIB browser are nicely laid out and easy to work with.

WhatsUp Gold is a decent network administration tool for the small or midsize business looking for basic monitoring capabilities. The Web server and Web interface are the crown jewels of this product and complement its simplicity, ease of use and value, but the flawed discovery process and mapping capabilities kept it from running away with the review.

WhatsUp Gold 7.0, Ipswitch, (800) 793-4825, (781) 676-5700. www.ipswitch.com

Castle Rock Computing SNMPc 5.1.6c Castle

Rock's SNMPc Enterprise with Remote Access Extension is a middle-of-the-road network-management tool, offering good administration and acceptable functionality for a good price, but it was not a standout. Still, while discovery can quickly spiral out of control, the product's remote console and wealth of MIBs are pluses.

By default, network discovery is launched the first time SNMPc is loaded. After we provided a seed router, the discovery agent used SNMP (or ICMP) to retrieve device and interface information. Although SNMPc does not have an import-host-file feature, you can easily mimic this behavior by adding individual hosts in the discovery agent window to act as host files.In our tests, SNMPc was unable to resolve DNS names on our Cisco routers and switches, leaving the mapped devices with names such as "Router9." SNMPc did, however, correctly poll and identify HTTP, FTP, SMTP and telnet ports. SNMPc also lets you configure and manage four additional user-definable ports.

SNMPc's autodiscovery is an untamed beast--if you're not careful it will find any device connected to the Internet by scanning through ARP tables! Even with IP filters set up in the discovery agent options, we still discovered subnets outside of our intended test bed. In our initial discovery, SNMPc even generated a ton of traffic over our firewall, which caught Network Computing senior technology editor and security guru Mike Fratto by surprise.

SNMPc populates the topology map while conducting the discovery. It maps devices graphically, similar to how Unicenter and OpenView do it; device status is represented with various colors and through the current event and history windows. There are eight custom tabs on the bottom of the screen to filter status information of individual devices.

SNMPc includes more than 340 canned MIBs from Cabletron, Cisco, Nortel and 3Com, and the product allows third-party MIBs to be compiled.

Setting threshold and manual alarms is a very straightforward process, as with WhatsUp Gold. Use any integer SNMP variable, apply some Boolean logic, and you're in business. As with CA's product, the Boolean logic is your logic, but unlike CA's product, SNMPc does not provide professional services, so you'll have to bring in a third party if you need help. Configured alerts can be generated by pager/SMS, e-mail, alert console, WAV sound, trap forwarding, executable, ODBC export or API links to other applications, such as Touchpaper Helpdesk.SNMPc contains canned reports of MIB table information that can be compiled hourly, daily, weekly, monthly or over user-configurable periods. The quality and detail of the reported information is good. All reports can be exported and scheduled to a text file, HTML, or ODBC database (SQL Server, MSSQL or Access). SNMPc also comes with Air Messenger Pro 3.8.1 alphanumeric paging software.

Included tools were a Win32 remote console, a Java console and a remote polling agent. The remote console is a thin client that connects terminal-service style to the SNMPc server. The remote console is a mirror of the host program, so any changes made to either the remote console or the host program are reflected in the other in real time.

The Java console is a sufficient but read-only look into the network map, reports and MIB information. Remote polling lets you use a discovery agent on another machine to reduce workload on the original server or search for devices over a slow connection.

SNMPc is a comprehensive Layer 3 monitoring device for IP and IPX networks. Its autodiscovery is a beast, but when tamed can be on par with Unicenter's and OpenView's discovery processes. SNMPc's reporting options and remote console are acceptable.

SNMPc Enterprise 5.1.6c, Castle Rock Computing, (408) 366-6540. www.castlerock.comComputer Associates Unicenter Performance Management and Unicenter Advanced Network Operations

A face-lift has improved this veteran network-management framework's usability and ease of administration. It is, however, still very complex when compared with Orion, WhatsUp Gold and SNMPc.

We got a bundle of products and a day's worth of professional services for under our $10,000 limit. CA's Performance Management and Advanced Network Operations come with and leverage the Unicenter management framework, and a new management portal is included, translating into a lot of functionality. In addition to discovery, event management and performance management, this package will give you a common-object-oriented repository.

Even though all this functionality was quite impressive, using it is more oppressive than it needs to be, even with the new Java and Web glue. If you have an existing CA relationship or in-house know-how, overcoming this complexity might not take long.

Unicenter certainly has solid fundamental event and performance-management capabilities. The event console is designed for an operator monitoring the network. It supports scrolling events, operator messages and complete configuration of severity using colors, ticker tapes, blinking and sound. Nothing is missing in terms of basic fault notification and monitoring.Event parsing and editing to create more understandable alerts and to act on alerts is one of Unicenter's strengths, but this requires you to understand what events mean and know how to parse them. Strong Boolean scripting logic makes it possible to suppress and deduplicate events, but unlike that of HP's product, it's strictly roll-your-own--there's no out-of-the-box functionality. CA's recommendation is to hire professional services to jump-start this for you.

CA's infamous 3D interface, which displays the network and devices with flying-through-space navigation, is still part of the product, but the myriad Win32 consoles--which in the past equaled a confusing number of navigation entry points--have thankfully been supplanted by a new Java "Explorer." The Explorer interface launches almost all of the Win32 and 3D interface functions.

Billboards, one Unicenter Explorer function, offer a new navigational shortcut that displays a synopsis of a device's or subnet's status. The Billboards lived within various important segments, subnets and routers.

Another interesting Unicenter Explorer utility is the "Historian" with its "Controller" view. The Controller is a slick navigational toolbar that stores bookmarks and provides access to CA's dynamic grouping of important network devices, aka BPVs (Business Process Views). The tool's main claim to fame is that it enables you to move back and forth through event histories. We selected devices, dialed in a date and pressed "play," watching the Explorer pane for events. It allowed us to look at the BPV devices one at a time and replay through previous days looking for errors. While we could see some value in looking for time between events on a device, it would have been more helpful to have an entire BPV selected, and have those events played back. We could not display the Controller when accessing the console through Windows Terminal Services.

CA's performance monitoring is very complete but also very complex--we spent way more time than we should have setting it up. CA says the next release will address our concerns by making administration of performance collections easier.CA provides a single proprietary performance agent, which, among many other system collection possibilities, can proxy SNMP collections. Because in our test we were gathering Host MIB and MIB2 data from servers, routers and switches--and we couldn't afford the proprietary agents for system collections--we installed only the single CA proprietary agent.

Unicenter's Performance Management is really a number of applications. The first, Response Manager, gathers SNMP, RMON, RMON2 and CA Response Management Probe data plus Cisco CPU and buffer utilization statistics. We were able to collect, baseline and threshold such data as utilization, errors and response time. The second app was a performance trending application, run in Excel and aptly named Performance Trend.

The Performance Management application provided the most granular control and flexibility for gathering and reporting on performance data among the products tested. But it was also the biggest pain to set up.

We created separate collection groups, which housed network devices and servers and against which we created different collection definitions. These definitions were more like policies, and they organized the data collection, making it easy to audit what performance data was being collected from where.

However, we found it all too easy to make mistakes. For instance, each data-archive parameter had to be separately defined as a distinct collection. So, daily, weekly, monthly and yearly collections each had separate definitions. This level of control allows for the roll-up of any business cycle, but it's important to understand that these are not different data-collection frequencies, just different data-storage parameters. All performance products do this, and the better products clearly expose what's going on, even offering knobs to configure changes when a data collection is started. Be prepared to spend a significant amount of time making this work.Control over network discovery is good, but not perfect. Configuration is possible via wizard, GUI edit or what came to be our favorite: the command line. As with the other products, we had a hard time using the GUI controls to limit our discovery to just the devices and interfaces we wanted to manage. Because there was no facility to add a seed- or host-device file, tech support suggested we create a batch file by running the discovery command once on each device. This worked well, and because the product has a nice scheduler application, we could automatically kick it off for rediscovery.

The Switch Manager, CA's current Layer 2 display, shows the intersection of VLAN interfaces and modules/ports. For servers that we discovered, the module/port would show the attached devices. Nice! Not all our switches--not even all the Cisco switches--were mapped, but any time a network-management application can accurately show attached neighbors, it's a plus.

Another interesting little utility is the "Path Dr.," which uses SNMP to determine routes between devices at Layer 3 only, creating a graphical map with guesses at segment speeds. It was slower than using trace route and incorrectly returned the speed of the fastest segment traversed as the speed for the entire route.

The Java interface was also slower than most. Occasional internal Jasmine errors displayed when retrieving log files and adding devices to the database after discovery. We were able to continue, however, without any apparent ill effect.

Unicenter Performance Management 3.0, Unicenter Advanced Network Operations 3.0, Computer Associates International, (800) 225-5224, (631) 342-6000. www.ca.comHP OpenView Network Node Manager

HP had the best discovery of all the products we tested. That said, you've heard the saying, "Close only counts in horseshoes and hand grenades"? Well, add network management to that list. HP would likely have tied for the top spot in this review had its event correlations worked during our testing. Too bad, because otherwise, even though HP charged the full $10,000, Network Node Manager kicks ass!

NNM promised to have the best fault management in this test, but we ran out of time before we could get it to deliver. At this writing, and despite much back and forth with technical support, some of the event correlation included with NNM was not working.

In addition to a filtered set of event consoles that sort events into categories, such as error, threshold, status and application, NNM provides extensive event parsing and editing, and correlation of event streams. NNM's event parsing is very similar to CA's, offering variable replacement, event text editing for clarity, and extensive Boolean logic to filter, delete and correlate. Also similar to CA, this feature takes time--and often money, in the form of professional services and training--to get up and running.

NNM has addressed this by offering canned correlation services, aptly dubbed Event Correlation Services, that promise to filter, interrelate and suppress unwanted and duplicate alarms. The configuration is accessible through a Web-based ECS viewer.The results of the correlation were less than overwhelming, however. "Repeated Event" correlation suppresses duplicate events, improving their consolidation by adjusting the time for correlation. But downstream event suppression, which recognizes layer topology relationships and suppresses events for devices that cannot be reached because an intervening router fails, didn't work.

We yanked a cable, and sat back and waited. And waited. And waited. Talked with tech support. Relearned our network, even started reading the manual. But we could not get the downstream suppression to correlate. The best guess as of this writing is that the glitch was due to missing route information in the database. This information is created during the discovery of devices and is not a separate manual configuration step.

NNM enables real-time and historical data collections, and a MIB expression tool provides the ability to calculate data collected to display performance stats, like utilization.

NNM has very nice controls for setting and maintaining thresholds. Canned threshold templates include utilization, availability and error. The thresholds support flexible targeting of devices by type, capability and through regular expression queries. NNM supports static thresholds, such as .02 percent error packets or based on statistical standard deviations, which helps to catch interfaces that are having major shifts in usage but not experiencing sustained server problems.

Rearming thresholds--the point at which the system prepares to issue a threshold violation--can be set by static, percentage, or threshold and/or statistical deviations. This fine-tuning helps dampen the reissue of threshold violations when, for example, interface utilization is hovering just above and below the threshold value.NNM dotted its I's and crossed its T's with the mapping and autodiscovery process. It wasn't perfect (none are) but the inclusion of seed files kept it under control, and the fact that NNM didn't try to discover every ARP entry it met made it better than most. An additional feature that turns off autoplacement of newly discovered devices also helped us keep control over what we were managing.

The discovery process is short and painless, mapping all devices in a nice, organized structure. The rediscovery process is excellent as well; the frequency of polling for new nodes decreases as fewer devices are discovered in each polling cycle. Also, when we turned off the autolayout function, NNM placed newly found objects in a holding area. That way we could control the placement and management of newly discovered devices.

The map hierarchy is represented in five levels--root, Internet, network, segment and node--but map navigation could have been better designed. Often NNM zeroed in on a particular section of the network or a blank portion of the map by default, so we became good friends with the pan and zoom options.

NNM shows its maturity in the number of devices and vendors it can discover and identify. It supports DMI, IP and IPX, and hundreds of enterprise MIBs, and it provides the ability to add more.

In OvLauncher, NNM's Web interface, we were like bulls in a china shop; we couldn't decide what to break next. Canned NNM reports include availability, exception, inventory and performance. We ran them all and then modified them to create new reports. Availability reports showed daily, month-to-date and general availability. Inventory reports broke down devices by type; for example, routers, switches, workstations, printers, segments and networks. Canned performance reports included Cisco Router TopTalkers and Top SNMP Interface Utilization, which are reported daily and month-to-date.The utilization reports didn't display 95th percentile but did show standard deviations from the interface's utilization average. This indicates how much the traffic varied in creating that average and therefore how consistent the traffic is.

Although it was snappy when running, the OvLauncher interface hung too often. Blowing off OvLauncher and starting over was the workaround. Another small annoyance is that OvLauncher spawns new windows for many of its operations, so the screen can get cluttered pretty quickly.

Partitioning the network required that we create subnet maps that held our movie application device. This cut-and-paste action created disconnected subnets and devices. Compared with CA's BPV, this functionality really needs work.

HP's documentation is very good, going beyond basic product usage to include computer-based operator training.

OpenView Network Node Manager, Hewlett-Packard Co. www.hp.comBruce boardman is executive editor of Network Computing, testing and writing about network management and systems. He has 12 years' IT experience managing network and distributed computing for a financial services provider. Write to him at [email protected].


