A Spare Is Just a Spare

Before you introduce a redundancy scheme, you must understand exactly how the product works.

May 8, 2003

3 Min Read
Network Computing logo

Yes, it takes time and money to prep redundant components, but that's the price you must pay if you're serious about minimizing downtime.

Time To Do Our Part

Network equipment vendors have answered the call to minimize downtime by building redundancy into their gear and making upgrades easier. Many also make redundant links to other redundant components possible using standards based on the IETF's VRRP (Virtual Router Redundancy Protocol, RFC 2338) and the IEEE's 802.3ad and 802.1w. We found this to be the case when we evaluated backbone proposals from Alcatel, Enterasys, Extreme and Foundry for our March 21, 2003 cover story. All those vendors support redundant CPUs and power supplies inside their devices, and standards-based redundant links between their devices.

As impressed as we were with those vendors' products, though, we know no vendor can guarantee 100 percent uptime. Complex gear is tough to maintain and troubleshoot, so before you introduce a redundancy scheme, you must understand exactly how the product works.

But How?First, you break it. Stage a mock outage in your lab before deployment (or, if that's not possible, on your production network during off hours) and observe how your backup equipment responds. Note how long the network stays down, and which parts of the network are affected. You can accomplish this by pinging continuously from one side of the network to the other. Also, unplug cards and power supplies to see if your network management software gets the SNMP trap messages. If you have full spare chassis, you can monitor them as well.

Document your findings, reviewing step by step what worked and what didn't, and develop approaches to improve the process. Then, if time allows, run through the drill again, and be sure you're satisfied before putting the product on your live network and filing the configuration information somewhere safe and accessible. (The outage at my friend's organization would have lasted much longer had there been no backup of the original software configuration stored on the network--the switch's complex array of VLANs would have been torture to reconfigure from scratch.)

Stage yet another planned outage during off hours shortly after the equipment is up and running on your live network. This will give you the opportunity to test your restore procedures in a real-world setting. It'll also give you one last chance to be sure all the spare gear you need is readily available.

Additionally, this is a good time to check for security access. If you want to be able to load software or configurations quickly from a TFTP server, for instance, be sure you have access across firewalls and router ACLs (access-control lists). Also be sure there aren't any filters on the servers that may stand in your way.

But the fun's not over yet. All too often, network problems stem from configuration changes made on the fly. Once you take the product live, you must monitor the network closely and chart every change you make, so you can retrace your steps if necessary. You don't want to be testing your memory when time is of the essence.Post a comment or question on this story.

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox
More Insights