Verifying Network Resilience

If you don't test, you don't really know if your infrastructure is resilient to failures.

Terry Slattery

December 22, 2016

2 Min Read
Network Computing logo

There have been several excellent posts recently on No Jitter about network resilience and network testing. Gary Audin describes "How to Approach Resilience Planning," Darc Rasmussen talks about using testing to "Make This a Happy Holiday Season," and Mike Burke tells us "How Not to Repeat History of Failed Testing."

"But it can't happen to us!" you say.

Really? It happened to Macy's, and over Black Friday, too. As Fortune senior writer Phil Wahba wrote, Macy's website went down on the second biggest shopping day of the year due to overflow shopping traffic.

Each of the above mentioned articles describes a slightly different perspective on resilience and testing. Underlying the different stories is a common theme: Good planning needs good testing in order to validate the implementation and the assumptions that went into the design and configuration.

That brings me to the question: Do you conduct failure testing and analysis of your network and UC infrastructure? Or is your organization afraid of touching the network for fear that it will break? Organizations that don't do regular testing are working from a position of hope, as in, "We hope that nothing breaks because it might not fail over to our backup systems." That's a precarious position to be in.

Many organizations already have redundant infrastructure -- dual WAN carriers, redundant core routers and switches, uninterruptible power supplies, backup data paths, and redundant IT services systems. However, I keep encountering organizations that have never run a planned test of their redundant infrastructure. Why wait for an emergency to learn that something doesn't work? It is much better to use planned downtime in which you can perform controlled tests.

It is a good idea to evaluate the failover process. Does the failover work the way you think it should? Is it fast enough for the applications? Does it self-heal when the failed device comes back online?

Disaster recovery may force a backup site to become the primary site for an extended period of time. Will the infrastructure and staff be able to handle the movement of the IT services that would be forced by a disaster at the former primary site? Think about all the companies that were affected by Hurricane Sandy, flooding in the Midwest, fires in the South, or earthquakes in the West. Many inadequately prepared companies simply cease to exist when their IT operations can't quickly return to functional health.

Read the rest of this article on No Jitter.

About the Author(s)

Terry Slattery

Principal Architect, NetCraftsmenTerry Slattery is a principal architect at NetCraftsmen, an advanced network consulting firm that specializes in high-profile and challenging network consulting jobs. Terry is currently working on network management, SDN, business strategy consulting, and interesting legal cases. He is the founder of Netcordia, inventor of NetMRI, has been a successful technology innovator in networking during the past 20 years, and is co-inventor on two patents. He has a long history of network consulting and design work, including some of the first Cisco consulting and training. As a consultant to Cisco, he led the development of the current Cisco IOS command line interface. Prior to Netcordia, Terry founded Chesapeake Computer Consultants, which became a Cisco premier training and consulting partner. At Chesapeake, he co-invented and patented the v-LAB system to provide hands-on access to real hardware for the hands-on component of internetwork training classes.Terry co-authored the successful McGraw-Hill text "Advanced IP Routing in Cisco Networks," is the second CCIE (1026) awarded, and is a regular speaker at Enterprise Connect and Interop.

Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like

More Insights