Superstorm Sandy Lessons: 100% Uptime Isn't Always Worth It
October 31, 2012
As a lifelong New Yorker who left the center of the known universe to move to Santa Fe, N.M., just a month ago, and the author of more than a few IT disaster recovery plans, I've obsessed over the news of the damage caused by superstorm Sandy up and down the East Coast. While much of what I've read has been well reported, I'm annoyed by some of what's come across my desk.
My biggest pet peeve is the flood of emails from PR agents representing backup software vendors, consultants, data center operators and website monitoring services, all looking to get a little publicity by offering up their CEOs for interviews. Millions of people are suffering, whether it's power outages or the loss of homes or lives. It's crass to try and use that for commercial (or political) advantage.
- Closing the Book on Windows Server 2003: Planning for Windows Server 2012 Opens New Possibilities
- Deeper Network Security: Protection Tips Revealed
- Forrester Study: The Total Economic Impact of VMware View
- Beyond Native Tools: Auditing SharePoint Across the Enterprise
I've also been annoyed at the general attitude amongst the technorati that every organization should have a bulletproof DR plan that keeps all their IT services up and running in any emergency. I've seen tweets from folks I respect that say Sandy proves VMware's 75-mile limit for WAN vMotion is inadequate for disaster recovery. The truth is, while three or four data centers in downtown New York City went offline, all the colocation or DR facilities I'd ever used in Jersey were still up. Others tweeted that they were surprised New York startups had their servers in a closet and were down.
Let me argue the opposite point. While disaster recovery planning is important, the best DR plans don't provide 100% uptime in every emergency. If you're not a huge Web presence like Amazon, or a bank or a multinational like Exxon-Mobil, it may actually be a better business decision to take some downtime when a 100-year event comes straight at you.
Scale matters. For some businesses, the money that would have to be spent to stay up during a massive storm such as Sandy isn't worth the cost. It would be like Exxon-Mobil building DR facilities to keep IT services running after a huge asteroid strikes the earth--at some point, DR spending is just wasted.
Superstorm Sandy is the very definition of a 100-year event: The NY subway is 108 years old, and Sandy is only the second time the subway has been closed for weather. When the 100-year event strikes, it doesn't just take out your data center. It also affects your customers and employees. If your customers and employees are what the folks on NPR's Marketplace called powerless nomads, wandering the streets looking for a place to charge their iPhones or figuring out how they're going to pump out their flooded homes, they're not going to be using the IT applications you worked so hard to keep running. If you, like most small and midsize businesses, do most of your business locally, you may be better off shutting down for a few days, or running very lean, so your employees can worry about their personal disasters.
Let's look at one of those NYC startups with servers in the closet. Let's say they have a million dollars a year or less of total revenue, and they spent $50,000 or so on the servers in that closet and the software they run. They probably spend another $500 a month on Internet bandwidth. To have a bulletproof DR solution they'd need to spend another $30,000 on duplicate hardware and software that would support replication, plus $2,000 to $5,000 a month for bandwidth and colocation space to house those servers. I argue that a week's downtime for that startup would cost it less than the $100,000 or so avoiding that downtime would cost in the first year, especially since the loss in good will of a week's outage would be limited if your outage is created by an event that dominates the TV news for a week or more.
Another lesson from Sandy is that backup generators just aren't as reliable as we'd like them to believe. Several hospitals in New York and New Jersey had multiple generator failures. Some colocation centers in lower Manhattan, including the one hosting the Huffington Post's servers, discovered that while the generators were safe from the rising waters, the fuel tanks in the basement flooded, shutting down the generators.
Yes, create a DR plan, and make sure your data is offsite so your whole business doesn't end when the data center is wiped out. But also recognize that from a dollars-and-sense point of view, maintaining 100% uptime through every emergency just may not be worth it.