DATA CENTERS

  • 12/22/2016
    12:28 PM
  • Rating: 
    0 votes
    +
    Vote up!
    -
    Vote down!

Verifying Network Resilience

If you don't test, you don't really know if your infrastructure is resilient to failures.

There have been several excellent posts recently on No Jitter about network resilience and network testing. Gary Audin describes "How to Approach Resilience Planning," Darc Rasmussen talks about using testing to "Make This a Happy Holiday Season," and Mike Burke tells us "How Not to Repeat History of Failed Testing."

"But it can't happen to us!" you say.

Really? It happened to Macy's, and over Black Friday, too. As Fortune senior writer Phil Wahba wrote, Macy's website went down on the second biggest shopping day of the year due to overflow shopping traffic.

Each of the above mentioned articles describes a slightly different perspective on resilience and testing. Underlying the different stories is a common theme: Good planning needs good testing in order to validate the implementation and the assumptions that went into the design and configuration.

That brings me to the question: Do you conduct failure testing and analysis of your network and UC infrastructure? Or is your organization afraid of touching the network for fear that it will break? Organizations that don't do regular testing are working from a position of hope, as in, "We hope that nothing breaks because it might not fail over to our backup systems." That's a precarious position to be in.

Many organizations already have redundant infrastructure -- dual WAN carriers, redundant core routers and switches, uninterruptible power supplies, backup data paths, and redundant IT services systems. However, I keep encountering organizations that have never run a planned test of their redundant infrastructure. Why wait for an emergency to learn that something doesn't work? It is much better to use planned downtime in which you can perform controlled tests.

It is a good idea to evaluate the failover process. Does the failover work the way you think it should? Is it fast enough for the applications? Does it self-heal when the failed device comes back online?

Disaster recovery may force a backup site to become the primary site for an extended period of time. Will the infrastructure and staff be able to handle the movement of the IT services that would be forced by a disaster at the former primary site? Think about all the companies that were affected by Hurricane Sandy, flooding in the Midwest, fires in the South, or earthquakes in the West. Many inadequately prepared companies simply cease to exist when their IT operations can't quickly return to functional health.

Read the rest of this article on No Jitter.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.