Start in the Middle
Adding redundancy is the most common way to increase your uptime, and the best approach is to start in the middle and work your way toward the edge. First, make sure there's redundancy within your core router--redundant CPU cards, power supplies and fans usually can be added to chassis-based routers and switches, and some router and switch vendors have equipment with dual backplanes. Each vendor does this differently, and in some cases, an outage occurs when the backup card takes over, but usually only new routes are affected while the new card comes up. With redundant CPU cards, you can force a failover to one card while you upgrade the second one, instead of having to bring the whole router down for the upgrade.
The core or backbone of a network usually handles the most traffic so, if it goes down, it will likely affect the most users. If your redundant core router or switch equipment is connected and ready to kick in automatically when a problem occurs, you can reduce an outage from hours of manual labor to an automated process that takes just a few seconds. This is called High Availability, where identical core routers must be ready to take over should the primary fail (see "Three Tiers for HA," above right). This means that the next layer out, the aggregator switches, has to have a connection to each router, which also provides some redundancy for the links themselves--which also lets you put each core router in different geographic locations.
Now that your network has redundant links, you must decide how packets on the network will select their paths and avoid loops. This isn't a new problem--redundant paths have been addressed by protocols like STP (Spanning Tree Protocol) at Layer 2 and routing protocols like the IETF's OSPF at Layer 3. But these protocols can take 40 seconds or more to resolve. OSPF takes up to 30 seconds to resolve; STP, even more. This is unacceptable for critical networks, especially those with real-time applications like VoIP and video.