

Monitoring the Vital Signs of your Network
Mysterious, however, were charts and tables that combined actual statistical variables, such as bandwidth, with what Concord refers to as its Health Index. The subjective term "health" is objectified through the setting of two threshold values: Health Index and Trend Threshold. The Health Index has four categories--poor, fair, good and excellent--each a user-defined percentage of a measured network traffic variable. The variables measured by the Health Index are, in the case of Ethernet, utilization, collisions, errors and broadcast. The Trend Threshold sets the uppermost limit for each variable. Network Health assigns values to these based on the actual network performance and network trends. The sum of the values represents the Health Index. The higher the value, the poorer the health.
Once we managed to gulp this down, the summary report of the network's Hourly Health Index, showing the aggregate health indexes as a stacked bar chart, almost made sense. Our initial black-and-white printed version of the report displayed each variable in shades of gray, black and patterned lines. We published to the Web and took a look at the HTML version of the report, which became clear with color.
The tabular reports entitled "Situations to Watch" and "Health Index Change Leaders" did not need color for clarity. Situations to Watch listed the top 10 nodes in terms of actual and predicted daily usage averages for all monitored variables, highlighting the difference between the estimated and actual numbers. The Health Index Change Leaders also listed a top 10, but according to amount of positive or negative change, measured by each node's current and previous health indexes. These reported swings in daily specific node performance were informative, but we wanted to know how the network was doing i
n general. To this end, the Average Health Index--a stacked bar chart indicating the average health of the nodes being monitored--helped us track the general daily status of each.
To ascertain the normalcy of a node
's behavior without auditing previous reports, we turned to the Exceptions Detail report. This report summarized a Health Index value and a Trend value for every exception, which for Ethernet was utilization, discards, errors and nonunicasts.
How these values are assigned is key to the Exception report's ability to intelligently flag nodes at risk of causing network problems. Health Index points are awarded on a logarithmic scale that assigns more points when errors or volume begin to rise, signaling problems early. Trend points are assigned in two ways, either by proximity to a set threshold or by predicted amount of time until a variable meets a threshold. In both instances, points are assigned on a steep curve as the volume or time threshold approaches.
Network Health calculates exceptions from predefined, customizable thresholds. We were able to change the number of nodes, the number of exceptions and the values for the Health Index and Threshold. Our local Ethernet backbone--which typically runs without any problems at or above 40 percent utilization--didn't show up on the exception report as it would have if we'd left the default 35 percent threshold. Set thresholds for exceptions that guarantee that those reported are worthy of attention; otherwise, you run the risk of missing actual errors in the report because they are masked by a large volume of false errors.
There are two kinds of Exception reports--Summary and Detail. The Summary lists the top nodes and their violation, while the Detail reports the specifics of each node's violation, including a time-of-day range and a daily thumbnail bandwidth utilization chart. This pairing of tabular and graphical elements helped us make quick judgments ab
out the exceptions reported. For example, at one point our entire frame relay access circuit was down, as well as all of its permanent virtual circuits (PVCs). This was a true violation that we could easily detect on the graphs, which showed a perio
d of zero utilization that coincided with the outage across all PVCs.
Report Options
Network Health has frozen the format of its canned reports but does allow alterations to the calculations, as well as report scheduling and output. We scheduled reports for specific days of the week and specific times of the day; when completed, these automatically were output to hard copy, file or the Web. The output options--text, CSV, PostScript, encapsulated PostScript and HTML--were the most complete of all the products we tested. Network Health does not permit ad hoc reporting, but given the strength and flexibility of the canned reports, we didn't miss it. Understanding and customizing the canned reports required a clear understanding of the Health Index, whi
ch we got from the carefully written manuals.
|