Alarm bells sound in earnest when a company's WAN shuts down. Whether that WAN connection provides a lifeline to corporate databases, carries internal phone calls or transmits corporate e-mail, it is just as vital to maintain data flow as it is to make sure employees can get into and out of your building. If your company generates sales over the WAN connection by an external sales force or through direct purchase over the Internet, every minute of downtime can equal thousands of dollars in lost revenue.
To avoid a problem, you first have to become aware of its potential existence, which is usually the hardest step. Although not all problems can be predicted, you can usually cover the most obvious. Next, you have to prioritize just how important the problems are and stipulate the maximum length of time a problem can persist before a solution needs to be put in place. This doesn't keep you from fixing problems, but it creates an ordered to-do list detailing what should be taken care of first and just how many "fire alarms" should be issued to get the system back in working order.
Keep the Juice Flowing
The first point of failure is the electricity that powers all your communications equipment. Maintaining power to the equipment is top priority. The first piece of equipment that needs to maintain power is your router or switch. If your WAN connection is the backbone that keeps offices running or shipments leaving the dock and it loses electricity, it doesn't matter what provisions have been made elsewhere.
If your equipment is powered by 12-volt DC, you can get by with just a rectifier to change the incoming AC to DC. This doesn't provide any insurance against power failure, however; batteries are needed to provide an uninterrupted source. Providing large-scale DC power backup is usually handled by bringing the outside current in through the rectifier and into a bank of batteries, and then into your equipment room. One drawback to large 12-volt DC backups, though, is that they demand respect. The acid level needs to be topped off from time to time, and accidentally dropping something metallic across the contacts can be a very enlightning experience. Devices such as pacemakers can also be affected just by being in a room with the batteries.
A range of UPSes is also available for AC-powered devices, from small rack-mount devices to power a single router, to large floor-mount units to power the entire equipment room. Liebert Corp. is probably the most common vendor for large-scale systems, but American Power Conversion Corp., MGE UPS Systems and Tripp Lite also provide high-capacity backup systems. If battery-based backups don't interest you, Active Power makes a UPS that uses flywheels to generate electricity.
Backup power can be provided to your equipment individually by having a smaller UPS power one or two devices. While this helps in situations where floor space is scarce, it is more difficult to monitor and maintain tens of smaller units than just one large UPS.
A UPS can keep your router gear powered only for so long. Eventually the batteries drain, and you're left in the dark with no way to access your WAN. If providing power to your equipment for more than 30 minutes or so at a time is imperative, then a gas- or diesel-powered generator should be in your inventory. Honda Motor Co. and Kubota Tractor Corp. both make portable and permanent generators for just about any application; keep them fueled, and they'll keep running.
Lines of Distribution
Just making sure you have a constant supply of electricity is not enough. Your next concern is to distribute that power to your equipment. All enterprise customers should consider placing PDUs (power distribution units) in their equipment rooms to segregate individual equipment or racks from each other. A PDU distributes AC or DC power to individual areas within an equipment room, much as a main circuit-breaker box distributes power to individual offices or floors within a building. By further separating power within the room, power to some devices or sections of a room can be cut off without affecting other sections or devices. Not only does this allow portions of the room to be dead for repairs or other electrical work, but more important, it can keep one rogue device that might accidentally explode and trip a breaker from affecting everything else in the equipment room.
The second point of failure is your router or switch. The most commonly flawed elements in these products are the power supply and the interface card. If your WAN connection is mission-critical and downtime has to be kept to a minimum, the best solution is to keep spares around.
Most routers and switches will accommodate multiple power supplies in the chassis. While both are operational, power needs are often split across them, minimizing overstressing of the power supply. If one supply were to fail, the router or switch would pull all it needs from the remaining supply.
Besides the power supply, interface modules are the next possible failure point within a router. Swapping out an interface card takes only a few seconds; it pays to keep a spare around as a precaution. Keeping extra cards around for every interface can be an expensive proposition, so it may be easier just to keep spares for the most important interfaces. If you use the same type of interface in offices around the country, one option is to have a spare that can be put on a plane in a moment's notice. This keeps your static inventory at a lower level and offers a fast replacement.
In addition to interface cards, a full spare chassis is not a bad idea either--if your bottom line can handle it. If your budget is not bottomless, however, there are alternatives. No customer should be an island, and your router vendor or distributor can be a good source of quick replacements. When purchasing equipment, you should be able to negotiate replacement availability. If your business is important enough, replacement equipment will be ready if you ever need it.
If your router vendor doesn't like to keep stock on hand for emergencies, you still have some options. One of these is to call upon a service like ReadyRouter, which maintains spare routers and can deliver fully configured replacements to your doorstep the next morning. By keeping a copy of your configurations, ReadyRouter can provide you with equipment that is plug and play, eliminating time-consuming configuration chores. ReadyRouter works as a partner to router vendors and resellers, so your vendor doesn't need to keep stock on hand for replacements.
This saves time and puts your network back online even more quickly. Regardless of how you plan to replace failed equipment, you should always remember to have backup copies of your configurations. It's easy to forget to back up a quick change, which could have disastrous results if old backups are used to restore a router. It is therefore also important to always maintain copies of your router configs and make a regular habit of backing up after every change, no matter how minor.
Preparing for Natural Disasters
These tasks should be everyday practices in enterprise networks. But what do you do about the things that are out of the ordinary, those occurrences that most people would classify as outright catastrophes? If your business depends upon centralized access to data over your WAN, then being prepared for a WAN outage is essential. While most disasters of this type fall into the "natural" category, they can also be brought on by human error or neglected, overworked equipment.
Some disasters can be recovered from or even avoided more easily than others. Equipment rooms are frequently located on the ground floors of most offices, though this may not always be the best place for them. Flash floods or even a burst water main will prove very quickly that ground zero is the wrong place, and raised floors may not even be enough to protect your network. While most telcos like to place the demarcation on the ground floor or even in the basement, a few doughnuts placed in the elevator might entice the installer to move everything upstairs. If your installer is on a diet, the connections can be placed in a fire- and waterproof cabinet.
If you're not convinced to avoid the ground floor, consider human error and what damage could be done by an errant vehicle running into your building. While equipment can usually be dried out and resuscitated after being under water, equipment left under the wheels of a tractor-trailer may not be so resilient.
Fires and earthquakes can also cause considerable damage to your network and your enterprise's income if your WAN is not protected against those catastrophes. Both these disasters can keep your network offline for several days, if not a lot longer. At these times, just having spare routers or being ready at a moment's notice may not be enough. Some steps can be taken to avoid damage, however, that might help in other instances. Unless you live in an earthquake-prone area of the country, you probably haven't even considered strengthening your racks against tremors. Bracing your racks to the ceiling and each other not only keeps Mother Nature from knocking things over but also prevents accidental topplings caused by careless employees or large objects run amok.
Just bracing your racks may not be enough to guard against all equipment-room mishaps though. Since customers rarely see these areas, they are not spaciously designed. But don't skimp on creating a roomy, safe environment. Providing adequate space between the rows of racks lets you maneuver equipment safely without the risk of knocking into other racks. Even problems caused by a tangled clump of cables can lead to serious downtime; the amount of time necessary to label and maintain well-ordered cabling schemes is minimal compared with the cost of downtime.
When bracing and organized cable runs alone can't keep your network operating, it may be time to consider a duplicate data center located at a second site to take over in case of catastrophe. With multiple locations on your WAN, a duplicate data center is probably the easiest to set up. Your duplicate center may not be able to handle everything, so that priority list is important in these situations. To avoid data getting out of date, it can be backed up to the duplicate during the evening off-hours or when WAN traffic is diminished.
If a duplicate center is beyond your budgetary means, consider having a site that could be turned into a backup data center if need be. Create a list of procedures to follow in case of emergencies. This list should stipulate what services would need to be used at the backup site and how to get backup data from the original site to the duplicate. As with router config backups, anytime the availability of necessary equipment changes, be sure to update your plans.
Send your comments on this article to Darrin Woods at dwoods@nwc.com.