Upcoming Events

Executive conference

Cloud Connect March 16-18

Comprehensive thought leadership for executives, IT professionals and developers. Topics include: the ROI, cost and economics of on-demand computing; Migration strategies to move from on-premise to cloud-based IT; Vertical cloud specialization, tailoring features and architectures to specific applications, industries, and customer ecosystems

More Events »

Subscribe to Newsletter

  • Keep up with all of the latest news and analysis on the fast-moving IT industry with Network Computing newsletters.
Sign Up
What Lies Beneath
C O L U M N  
A Spare Is Just a Spare

  May 8, 2003
  By Peter Morrissey


TOC Issue TOC
Printer Print full article
Printer Download as PDF
E-Mail E-Mail this URL
Discuss Discuss this article
flame author Flame the author

A friend called me recently to commiserate about a network outage at his organization. A backbone switch had crashed, and the spare switch displayed a low-level system prompt usually seen only when no software is installed on the switch--apparently the system software had been corrupted. Before my buddy and his co-workers could get the network back up, they had to spend valuable time reloading and reconfiguring the software.

Although corporate management had done the right thing by approving the $20,000 investment in a spare switch, the IT team hadn't done its part to ensure the switch would be ready for duty should the need arise.


Yes, it takes time and money to prep redundant components, but that's the price you must pay if you're serious about minimizing downtime.

Time To Do Our Part

Network equipment vendors have answered the call to minimize downtime by building redundancy into their gear and making upgrades easier. Many also make redundant links to other redundant components possible using standards based on the IETF's VRRP (Virtual Router Redundancy Protocol, RFC 2338) and the IEEE's 802.3ad and 802.1w. We found this to be the case when we evaluated backbone proposals from Alcatel, Enterasys, Extreme and Foundry for our March 21, 2003 cover story. All those vendors support redundant CPUs and power supplies inside their devices, and standards-based redundant links between their devices.

As impressed as we were with those vendors' products, though, we know no vendor can guarantee 100 percent uptime. Complex gear is tough to maintain and troubleshoot, so before you introduce a redundancy scheme, you must understand exactly how the product works.

But How?

First, you break it. Stage a mock outage in your lab before deployment (or, if that's not possible, on your production network during off hours) and observe how your backup equipment responds. Note how long the network stays down, and which parts of the network are affected. You can accomplish this by pinging continuously from one side of the network to the other. Also, unplug cards and power supplies to see if your network management software gets the SNMP trap messages. If you have full spare chassis, you can monitor them as well.

Document your findings, reviewing step by step what worked and what didn't, and develop approaches to improve the process. Then, if time allows, run through the drill again, and be sure you're satisfied before putting the product on your live network and filing the configuration information somewhere safe and accessible. (The outage at my friend's organization would have lasted much longer had there been no backup of the original software configuration stored on the network--the switch's complex array of VLANs would have been torture to reconfigure from scratch.)

Stage yet another planned outage during off hours shortly after the equipment is up and running on your live network. This will give you the opportunity to test your restore procedures in a real-world setting. It'll also give you one last chance to be sure all the spare gear you need is readily available.

Additionally, this is a good time to check for security access. If you want to be able to load software or configurations quickly from a TFTP server, for instance, be sure you have access across firewalls and router ACLs (access-control lists). Also be sure there aren't any filters on the servers that may stand in your way.

But the fun's not over yet. All too often, network problems stem from configuration changes made on the fly. Once you take the product live, you must monitor the network closely and chart every change you make, so you can retrace your steps if necessary. You don't want to be testing your memory when time is of the essence.

Post a comment or question on this story.

--Peter Morrissey, pmorrissey@nwc.com

Best of the Web

Data deduplication: Declawing the clones

Data deduplication is emerging as a critically important new arrow in the storage administrator's quiver to answer hard questions about the increasing problem in storage growth costs.

Quick Read

Compression, Encryption, Deduplication, and Replication: Strange Bedfellows

One of the great ironies of storage technology is the inverse relationship between efficiency and security: Adding performance or reducing storage requirements almost always results in reducing the confidentiality, integrity, or availability of a system.

Quick Read

WAN Optimization Whitelists and Blacklists

Optimization is a fantastic way of saving money and creating really happy customers at the same time, but it doesn't work flawlessly for all applications.

Quick Read

WAN Optimization as a Managed Service: It's Not About the Cost

This insight examines how organizations outsourcing their WAN optimization initiatives to a third-party go about achieving their goals for application performance, reducing operational costs, and streamlining enterprise infrastructure.

Quick Read

  Sponsored Links

Premium Content

Next Generation Data Center, Delivered, November 17th
NWC


Salary

Video