Most network-management providers won't offer aggressive service or huge cost savings to small and midsize businesses. But we pushed and prodded, and got PerformanceIT to give our fictional food chain

July 22, 2003

Costing It Out

The annual cost of nine networking and six helpdesk staffers--$1.08 million including benefits--could pay for outsourcing of network and systems management (NSM) functions and still leave some money to be shifted to development efforts. In addition, TacDoh's budget was clogged with the usual high-cost management tools, including Hewlett-Packard's HP OpenView, Concord Communications' eHealth, and CiscoWorks and N-Form for the Adtran frame relay access devices. We earmarked MRCs (monthly recurring charges) for this network-management hardware and software as funds that could be repurposed. We stated no requirements regarding removing any or all existing personnel and management systems, but we made clear this was a cost-offset attempt (for more on how we figured TacDoh's current costs, see "Cost Comparison: Chewing the Fat").

We found that our fried-pie-in-the-sky economizing calculations were nave. Some vendors were more aggressive, but our savings were less than we'd hoped, to say the least (see the cost comparison chart on page 58).

The Network

The TacDoh empire makes and sells deep-fried snack foods through 300 retail outlets in the Northeast, Southeast and Midwest that are connected via frame relay to three warehouse distribution centers. Corporate TacDoh is located at the Chicago warehouse site. The warehouses are meshed over the Internet using Route Science 3100 Path Control devices (for the particulars of TacDoh's situation, see "Scenario" on page 56).The network infrastructure to be managed contains a mix of device types and would require each vendor to provide fault, performance and configuration services. We weren't sure what sort of response we'd get regarding the configuration, but we were pleasantly surprised: Sometimes it was limited, or tiered in terms of what was available, but basic configuration management was ubiquitous.

TacDoh's mix of communications gear may not be ideal, but that's a common scenario at many companies, and most are stuck with their mismatched equipment for some time. Certainly, TacDoh will not replace hardware while considering outsourcing network management.We broke costs out into monthly recurring charges and one-time groupings. There were differences in what providers consider standard offerings, but we tried to normalize the menu, using specifics stipulated in the RFI as our baseline. So, for example, performance baselines and capacity planning were included by iNOC and PerformanceIT, but quoted as separate MRCs by HCL, while NetProactive required a separate bid to provide these services. Because TacDoh's in-house networking department provides, and our RFI asked for, capacity planning and historical baselines, we included the separate cost in HCL's total MRC (see the full RFI and responses).

We also cared about guarantee of services. Every branch was to have 99.9 percent uptime; every warehouse, 99.99 percent uptime. Transactions from the branches were to be completed within 5 seconds, with no more than 2 seconds using network time. We knew that the care and feeding of the PoS (point of sale) systems, back-end application and database servers might be beyond the scope of the MSPs, but in the real world the network group is the first line of defense--and the arbitrators between IT factions--so we had to ask. We were surprised again: All but NetProactive stepped up and took on the SLAs (service-level agreements) without hesitation.

The RFI also spelled out some unusual problems--NTF (No Trouble Found), act of God and erroneous WAN provider disconnect--to get a sense of how each vendor handled situations where the root cause was outside the official responsibility of the network-management group but nonetheless a common and time-consuming chore. During the process of checking out a case of WAN trouble, service providers frequently clear the problem, thus masking, not resolving, the issue. It may appear to the service provider as though nothing was wrong even though the problem is likely to recur, requiring tenacious interincident tracking. Acts of God--say, power outages due to lightning striking power transformers--can cause dispatches, which, while technically not the fault of the network-management group, is clearly its mess to clean up.

Finally, WAN circuits are occasionally disconnected by mistake. It's up to the NSM group to jump up and down to get the service restored. We considered it vital to pin down exactly how the MSPs would react in these cases--especially considering that TacDoh might not have any network-savvy people left on staff once the outsourcing went into effect.Our final two evaluation categories, operations and reporting, are much more important in real life than in the land of TacDoh. Operations is our catch-all term for a wide range of service differentiators, from strict change control to the cafeteria and recreation available to the NOC staff. The common denominator: Acing these criteria indicates stable, reliable and mature operations, lending credence to the idea of trusting them with the deep-fry kingdom.

Reporting, of course, is the conduit by which network and application health and activity are communicated. Each MSP cited complete and comparable reporting engines, with what appeared to be some differences. We say appeared to be because we didn't test them. Nor did we visit the network operations center, taste the lunchroom special or play any late-night ping-pong. So while we recognize the importance of these categories, we minimized their overall weight.

We broke reporting scores into two basic areas: The breadth and depth of the reporting, including how flexible it was, and report publication. As mentioned above, all the vendors promised link, availability and even application statistics on a store-by-store basis. (Read more on how we graded.)

Who Won and Why

Our Editor's Choice is PerformanceIT, which provided excellent cost, service management and reporting. Its RFI response addressed all of TacDoh's needs. A close second, iNOC addressed TacDoh's requirements, but was a bit less aggressive in terms of cost savings and didn't give quite the same level of service assurance as PerformanceIT. HCL did well overall, providing granular detail about its operations and reporting, but it didn't compete as aggressively on the cost side as did iNOC and PerformanceIT. Also, HCL ignored TacDoh's special requests (see "Specific Requirements"). NetProactive Services offered what appeared to be above-average reporting and good service-level and operation assurances, but at a price that didn't compete.The pricing we received was based on TacDoh's specific requirements, and none of the vendors was aware of rivals' responses. In real life we could have pitted them against one another and perhaps gotten them to shave off a few bucks. Atlanta-based PerformanceIT won us over with its no-nonsense RFI response, which addressed our requests, showed due diligence and offered the most aggressive cost offset, with a reduction of $163,000 per month from IT personnel budget. Initially, PerformanceIT's response didn't offer any reduction, and we had to prod the company to make some recommendations. It was clearly a touchy subject because the company wished to avoid the negative side of outsourcing--layoffs. But in the end, the MSP not only offered a near-term reduction of at least 20 FTEs (full-time employees), it also projected that an additional 20 workers could be cut within the first year. For our comparison we took them up only on the initial 20, wanting to remain conservative.

PerformanceIT also led the field in the one-time cost category by offering predictable and comparable costs and a no-fault contract termination: With 30 day's notice, and without any penalty, TacDoh could dump PerformanceIT. The company stated that it felt so sure TacDoh would be happy with its services that no penalties, buyouts or commitments were necessary, and it stated that this is standard for all customers.

PerformanceIT's service-level management stance was equally aggressive and unique. From the get-go, it made this simple statement: If the service fails, TacDoh gets a refund. The provider then went on to explain how each service would be monitored, and what constituted the service's success and failure. Provisos were reasonable, boiling down to factors beyond PerformanceIT's control, such as earthquakes. However, even in such a case, the RFI stated that PerformanceIT would respond. Like most of the vendors, it offered a simple weighted formula that became more severe as the outage lengthened. The SLA met the response times requested and offered warning and critical fault levels.

PerformanceIT was the only vendor to respond to all of our special incident situations, such as NTF. In each case, it took ownership, vowing to resolve the problem first and figure out the cause later. The recurring motif was that PerformanceIT would stay engaged until TacDoh was satisfied--no ifs, ands or buts.

PerformanceIT offers 24/7/365 monitoring via its SOC (Support Operations Center) team for responding to problems and performing proactive maintenance. But before we engaged its services we'd want to see an outline with a level of detail similar to that provided by HCL.Although PerformanceIT didn't provide a targeted cost for WAN provisioning and audit, indicating that the service was outside its normal scope, it did provide the reports that would let TacDoh audit WAN usage. The rate of $150 per hour and $1,000 per day, on site, is its general quote for special work.

Operationally, full SNMP monitoring, including RMON where available, and performance and fault management were part of the service. The metrics collected include MIB II link utilization and error buckets. TCP and UDP port monitoring, along with syslogs and occasional data capture, were included.

PerformanceIT initially did not support TacDoh's Adtran N-Form management application, and we were impressed that the MSP contacted Adtran to determine the effort required. It found that the Adtran application doesn't support standard SNMP, but said that Adtran plans to add this capability by year's end, at which time PerformanceIT promised to support that application at no additional cost to TacDoh.

PerformanceIT was the only vendor to place appliances at each warehouse location. Each appliance polled the network and mirrored our database, offering redundancy and data aggregation while maintaining the distributed computing model.

When it comes to reporting, PerformanceIT hit all the right notes, offering the right reports and the right publication model. For example, top users, errors and links by store, region and just about any other sort were available. Like the other vendors, PerformanceIT offered Web browser access with the capability to define as many as six roles. This would let us provide store-level access, though we would have liked to delegate authority at least to the regions; delegation would involve creating a regional partition, within which that region's administrator could create store roles, or even further partitions.PerformanceIT Network Management Service, PerformanceIT, (888) 242-9365, (678) 323-1300. www.performanceit.comThe best MRC rates, low one-time costs and very good service-level management helped iNOC finish a very close second. It was hard to find fault with the Northbrook, Ill., MSP's proposal, with the exception of having to sign a three-year contract. This contract length may not sound excessive, but the other vendors offered one-year or month-to-month arrangements.

The iNOC response was short and to the point, yet quite detailed. The company didn't waste our time with fluff. We quickly saw that pricing was significantly better than that of the other participants--iNOC didn't mince words about the number of FTEs it would offset, offering its own cost analysis. We used the number of FTEs iNOC suggested, which was less aggressive than PerformanceIT's, but we applied our own salary and benefit numbers to maintain our comparison across vendors.

INOC supports the desired availability and transaction times in the stores with a couple of reasonable caveats: First, service for WAN circuit failures is dependent on the availability of the WAN provider. Second, the throughput goal of less than 2 seconds network time needs to be baselined prior to iNOC accepting responsibility. For this baseline, a one-time fee of $12,000 is charged. The one-time service initiation fee to convert to iNOC is $45,800, plus $12,000, for a total of $57,800 initiation.

TacDoh's PoS transactions were specified as a supported part of iNOC's coverage. In addition, it specified that it will manage the Ethernet switches, wireless access points and WAN CPE (customer premises equipment) and circuits in each of the stores. We expected this coverage from everyone, but NetProactive said it wouldn't offer this service, and HCL didn't address this in its response.

For our special gray-area trouble situations, iNOC impressed us with no on-site charges for NTF. Its reasoning, which we liked, was that if it diagnoses problems correctly these types of ghost resolutions will be kept to a minimum. If an on-site error turns out to be caused by a WAN service provider, iNOC reduces its normal $150-per-hour fee to $125.INOC recommended installing terminal servers at each store to provide analog modem access and out-of-band management. It was the only service provider making this good suggestion and estimated the cost at between $8,000 and $15,000 per warehouse, for a total cost between $24,000 and $45,000 to purchase the equipment. In addition, analog lines would run between $17 and $23 per site per month.

Metrics monitored were very complete, including utilization and errors on links, and CPU on servers and routers. Five-minute samples rolled up weekly, monthly and quarterly would be available as part of iNOC's default offering, providing good performance reporting; PerformanceIT required that TacDoh have a database for the trending that iNOC provides. Trouble ticketing was, like the other vendors, online and real-time, with monthly summary reports.

IMonitor, iNOC, (877) 510-4662, (847) 714-9909, (608) 663-4555.

HCL's response, at 100-plus pages, laid out the strongest operational and reporting response of all the vendors, but pricing was higher than iNOC's and PerformanceIT's, and HCL didn't specifically address some of TacDoh's service-level management concerns.

HCL, which has 25 offices in 14 countries and is based in India, was the only vendor that planned to continue to use the management applications TacDoh's in-house staff supports. This raised the overall TacDoh IT budget by missing a reduction the rest of the vendors took. It did, however, make for redundant and distributed data collection as HCL uses OpenView in its NOC, letting it leverage the existing hardware and software architecture.

The RFI contained a requirements analysis and solution overview, but we never got the feeling that HCL was particularly focused on TacDoh's needs. The requirements were nothing more than a reorganization of the RFI we submitted, and the solutions suggested missed the mark. The responses were very detailed, explaining what tools were to be used, the roles and behaviors governing the relationship and what metrics were to be gathered, but HCL didn't directly answer some requirements.For example, we specified that branches were to experience 99.9 percent uptime and no more than 2 seconds in network latency. But nothing in the 100-page response spoke directly to these thresholds. There was detailed SLA governance documented, with ongoing feedback built in to guide and align service delivery and expectations; all good, but not on point. Further, we would have expected some push back, like we got from NetProactive, about being held responsible for the WAN circuit providers, but HCL didn't address this.

This isn't to say that HCL appeared lax or unable to perform; we just didn't see our specific concerns addressed and didn't get that warm and fuzzy feeling that TacDoh would receive VIP treatment.

HCL did break its regular service into very organized responsibility areas. The engineering services, which include capacity planning, add an additional $12,000 per month to the bill. It's possible that if a network engineer is left on the TacDoh IT staff this charge could be partially mitigated, but not totally eliminated because other engineering services likely will be required.

In general, HCL's SLA management was organized, but not very detailed. It appears from the RFI response that HCL has procedures in place to track the SLAs that TacDoh cares about, but the answer is short on specifics. We asked how the charges and service levels would deal with acts of God, for instance, or the dreaded NTF, but there was no response.

We got lots of talk about how well-organized HCL's NOC was, less about the service it was to provide. For example, the NOC supports two Internet connections, with the second having a direct satellite uplink to Thailand ISP ThaiCom, to avoid any catastrophes.Intelligent Network Operations Services, HCL Technologies America, (908) 822-9036. Simply put, NetProactive's response was the most expensive by a long margin at a little over $200,000 monthly (compared with the next highest, PerformanceIT's $128,000) and with a much lower personnel offset cost of only $47,000. Additionally, NetProactive didn't address as many of TacDoh's needs as the other MSPs. In its favor, reporting looked to be above average, providing good access and a decent selection of performance metrics.

NetProactive's response was brief and to the point. Unfortunately, this brevity was not limited to the RFI: The stated services provided were also brief. For example, NetProactive was unwilling to stand in for TacDoh with other service providers, putting TacDoh right back in the business of negotiating and managing WAN services.

NetProactive, which is based in Bangalore, India, will audit and configure as needed QoS on the network infrastructure to improve the chances that transactions will be serviced in a timely fashion. Alerts will also be set to allow for the notification of any network transit time that exceeds 2 seconds.

No monitoring of the PoS system was included in the price. No billing credit, just reporting on total device availability. Intervening with network devices and the availability of the store network systems are supported, but NetProactive voiced concern that since the back-end database systems were out of its control, it couldn't guarantee PoS availability and throughput.

Although NetProactive will provide reports to help manage our WAN providers, we were disappointed when it specifically indicated that TacDoh is on its own if WAN problems arise.This company was short on SLA management details, like HCL was, but it did say it would have people on site at each warehouse ready to be dispatched to the stores, for an hourly fee. This helps explain the higher cost. Dispatch of these on-site personnel is immediate during business hours but next day during off hours. This does not compare well with the other outsourcers, which provide 2- and 4-hour response times. However, the price, $70 per hour, is less than everyone else's.

NetProactive recommended a takeover timeline of 33 days of work over about eight weeks. This includes a 30-day pilot and a week of tuning and improvement prior to going live. All in all, the timing and effort seemed reasonable, and the $13,200 price was one of the least expensive.

NetProactive said it would support the Adtran N-Form software, creating reports published on the portal. There is an additional charge of $1,100 per month to gather and report on this data. This cost is included in the MRCs listed in the pricing chart. The lack of SNMP support was not mentioned by NetProactive, leaving us to wonder how it plans to create the reports.

Remote Infrastructure Management, NetProactive Services, (877) GOIMARC, (949) 623-8312.

Bruce Boardman is executive editor of Network Computing, testing and writing about network management and systems. He has 12 years' IT experience managing networks and distributed computing for a financial service provider. Write to him at [email protected].Post a comment or question on this story.

TacDoh Looks To Slim Down Its NSM

TacDoh Corp. is 5-year-old food-services company that provides deep-fried snacks 'round the clock via an expanding network of 300 retail walk-in and drive-up outlets supported by three warehouse distribution centers in Atlanta, Chicago and Newark, N.J. TacDoh's business model places a premium on returning customers. To this end, the company strives to provide convenient access to a consistent product via initiatives such as:

• Internet ordering: An e-commerce site provides personalized "MyTacDoh" capabilities. Registered customers can define "The Usual" with a single mouse click; the product is ready at a specified time and billed directly to the user's credit card.

• SitAnywhereTacDoh: To attract professional customers, TacDoh offers registered users 802.11 wireless hotspot access at each outlet. Customers can surf the Web or download e-mail, with convenient direct billing available at all outlets. This has had a positive bottom-line impact: Half of each location's registered customers use the wireless facilities daily and spend, on average, 15 percent more per transaction, boosting TacDoh's annual revenue to $3.5 million and adding nearly $4 million to its gross sales last year.

• Share the Fat: Believing happy employees make for happy customers, TacDoh shares profits with its associates through this unique program. STF recognizes outstanding performance and tracks bonus pay based on sales and service achievements. Considered an industry differentiator, this in-house-developed application is as important to the company's ongoing success as its point-of-sale, inventory tracking and general accounting programs.However, TacDoh's IT group has hard choices to make: The STF application development effort requires increased programming work, but the IT budget is frozen at last year's levels. Even though the infrastructure is of prime importance in supporting the company's distributed retail chain, customer outreach program and employee incentives, network management is considered an expense. TacDoh is exploring ways to manage this cost without interrupting store operations or customer-base enlargement.

Each of TacDoh's 300 retail outlets is linked to one of the company's regional distribution centers, which act as warehouse and communications hubs. The Chicago center is the original location, housing the corporate offices and data center. Chicago supports 150 retail stores, while Newark and Atlanta have 100 and 50 locations, respectively (see network diagram, page 55).

In our RFI we defined the following goals and asked vendors to indicate how these goals will be supported, violation thresholds, and audit and reimbursement procedures (the RFI and complete vendor responses are available here.)

• Reduce and stabilize network management costs by outsourcing configuration, monitoring, reporting, planning and maintenance of all network infrastructure at each warehouse, store and the corporate offices.

• Keep existing helpdesk and operations support, but at reduced levels to support PoS, accounting and STF applications. Staff reductions based on removal of network and branch systems are planned.• Maintain 99.9 percent availability service levels at the stores; availability at regional warehouse facilities must be 99.99 percent.

• Keep all transactions to less than 5 seconds, with network transit time under 2 seconds.

• Maintain 24/7 network availability to support online transactions and customers' wireless transactions to the regional warehouses.

• Accomplish autodial backup within 5 minutes of dedicated circuit outage. Dial backup also must be dropped within 5 minutes of the dedicated circuit restore.

• Provide reports by store, warehouse and corporate user.• Provide sales-tracking integration. Response times are fed to an in-house application, which tracks POS activity and "SureWeKnowU" access. This is currently an XML data feed with server CPU and memory usage, and network latency and utilization.

• Report SLA threshold violations and near violations. Ideally, this will provide for a configurable list, management of which can be delegated and distributed.

• Allow for circuit billing. WAN payments will be done by TacDoh corporate accounting, but provisioning and WAN billing audits are to be outsourced. Network CIR (committed information rate), burst and utilization must be correlated to billing reports from WAN providers. These reports should be audited for accuracy and flagged where billing reimbursement is required. (New stores come online regularly, usually with three month's notice.)

• Provide on-site support. Network staffers now take general responsibility for on-site network troubleshooting and repair. TacDoh wants responses to indicate the cost of such visits if they were outsourced.

Currently, network management is supported through Adtran N-Form, CiscoWorks 2000, HP OpenView Network Node Manager and Concord Communication eHealth. Primarily tools of the networking department, these applications are also used by helpdesk, operations and systems personnel. Savings realized by decommissioning these apps will help offset the outsourcing cost, but retraining must be considered. Vendors should indicate which network-management tools will be provided and whether retraining will be required.Our Original RFIPerformanceIT


HCL Technologies -- RFI Unavailable

NetProactive ServicesThe prime motivator for this RFI was to save IT dollars for reallocation into the application-development portion of TacDoh's IT department. To this end, we attempted to take into account all the personnel, hardware and software costs needed to make TacDoh's IT engine run. We used salary data based on BusinessWeek's Salary Wizard and considered network engineers, network operators and related management salaries as prime targets. We also considered other IT positions, such as desktop support and helpdesk staffers, when a vendor indicated it would manage the PoS equipment in each branch. Besides salaries, the model included a 20 percent bump for benefits.

The recurring license cost of TacDoh's network-management software was added at an annual rate of 18 percent, which, though not absolute, is standard. Purchase prices were based on recent Network Computing reviews. Hardware replacement and repair was added in at $65,000 annually. No depreciation costs were calculated. It was up to MSPs to determine how many and which jobs got replaced and repurposed. Although HCL was the only vendor to rely on TacDoh's HP OpenView installation, all the vendors incorporated some of the existing hardware and software.Other costs crept into the evaluation and had to be compared. All the providers shared a conversion or setup fee, for example, and these varied greatly--from $13,000 to $90,000 dollars. Other costs included integration of the back-end sales tracking application Share the Fat, and the cost of sending someone onsite to a store. But the major factor was the contract length and early termination cost. These varied wildly, from nothing to more than $600,000 dollars.

We were disappointed that the WAN circuit audits were an additional cost. On the one hand, it's understandable--this is additional work and usually falls to someone in finance or a collaboration between a network manager and finance. But it is critical, and at the nexus of WAN-management costs. Seems like an important and measurable ROI point for the outsourcers to make.

The complete RFI and vendor responses

More on how we devised TacDoh's costs

Detailed information on how we graded

Network & systems management white papers & research reportsNetwork & systems management books


IT Outsourcing Toolkit

"World View: Sourcing IT Globally"

"Federal Outsourcing Battle Heats Up""Outsourcing IT: Is There a Downside?""IT budget reduction" garnered a 30 percent weighting in our report card because the prime motivator for the RFI was to save money. Period.

Savings without service is worthless, so our secondary goal, weighted at 30 percent, was service-level management. Billing credits, SLA structure, transition management, and outage management and response played the biggest roles. We broke the service category into two parts: What was going to get monitored and what happened when a service didn't meet the agreed upon level. For the former, we asked for specific availability percentages of 99.9 for stores and 99.99 for warehouses.

Although these service-level management activities are readily understood, the service-management best-practices category is more of a catch-all bucket meant to get at the kinds of project management and preparedness each vendor implicitly or explicitly put forth in its RFI responses and subsequent answers to our many questions.

The low weight of 20 percent may seem to indicate a lower importance, but it is actually a reflection of what was claimed. Because we didn't test or engage the providers, it would require due diligence to check out their personnel practices, data centers, trouble ticketing, response times, and training, for instance.

Finally, we only gave reporting only a 10 percent weight, again, for a couple of reasons, neither of which were because reporting doesn't matter. After all, reports are the window into the network's application and business delivery success or failure. But again, we didn't use these products, and we found that from the demonstrations and descriptions provided, the differences were not huge. Some appeared to be better than others, but it seemed that, overall, report distribution and metrics monitored were similar across all products.More important, reporting is a Catch-22 for outsourcers. Although the reporting did offer very granular network-performance metrics, no one left on TacDoh's staff had the network chops to understand the meaning. This is somewhat of an exaggeration--it doesn't require network expertise to understand all the reports--but relating errors and CPU utilization on a router interface to slow response time, isn't a simple correlation. Although the reporting is nice, TacDoh, and any company that goes this route, will need to rely on the outsourcer to make these of correlations and suggest fixes.


Network & Systems Management Services

