Superstorm Sandy Lessons: 100% Uptime Isn't Always Worth It

Sandy brought out the worst in some tech pundits who were ready to pounce on companies that went offline in the face of a 100-year catastrophe. But in cases like Sandy, continuous uptime is the wrong goal.

Howard Marks

October 31, 2012

4 Min Read
Network Computing logo

As a lifelong New Yorker who left the center of the known universe to move to Santa Fe, N.M., just a month ago, and the author of more than a few IT disaster recovery plans, I've obsessed over the news of the damage caused by superstorm Sandy up and down the East Coast. While much of what I've read has been well reported, I'm annoyed by some of what's come across my desk.

My biggest pet peeve is the flood of emails from PR agents representing backup software vendors, consultants, data center operators and website monitoring services, all looking to get a little publicity by offering up their CEOs for interviews. Millions of people are suffering, whether it's power outages or the loss of homes or lives. It's crass to try and use that for commercial (or political) advantage.

I've also been annoyed at the general attitude amongst the technorati that every organization should have a bulletproof DR plan that keeps all their IT services up and running in any emergency. I've seen tweets from folks I respect that say Sandy proves VMware's 75-mile limit for WAN vMotion is inadequate for disaster recovery. The truth is, while three or four data centers in downtown New York City went offline, all the colocation or DR facilities I'd ever used in Jersey were still up. Others tweeted that they were surprised New York startups had their servers in a closet and were down.

Let me argue the opposite point. While disaster recovery planning is important, the best DR plans don't provide 100% uptime in every emergency. If you're not a huge Web presence like Amazon, or a bank or a multinational like Exxon-Mobil, it may actually be a better business decision to take some downtime when a 100-year event comes straight at you.

Scale matters. For some businesses, the money that would have to be spent to stay up during a massive storm such as Sandy isn't worth the cost. It would be like Exxon-Mobil building DR facilities to keep IT services running after a huge asteroid strikes the earth--at some point, DR spending is just wasted.

Superstorm Sandy is the very definition of a 100-year event: The NY subway is 108 years old, and Sandy is only the second time the subway has been closed for weather. When the 100-year event strikes, it doesn't just take out your data center. It also affects your customers and employees. If your customers and employees are what the folks on NPR's Marketplace called powerless nomads, wandering the streets looking for a place to charge their iPhones or figuring out how they're going to pump out their flooded homes, they're not going to be using the IT applications you worked so hard to keep running. If you, like most small and midsize businesses, do most of your business locally, you may be better off shutting down for a few days, or running very lean, so your employees can worry about their personal disasters.

Let's look at one of those NYC startups with servers in the closet. Let's say they have a million dollars a year or less of total revenue, and they spent $50,000 or so on the servers in that closet and the software they run. They probably spend another $500 a month on Internet bandwidth. To have a bulletproof DR solution they'd need to spend another $30,000 on duplicate hardware and software that would support replication, plus $2,000 to $5,000 a month for bandwidth and colocation space to house those servers. I argue that a week's downtime for that startup would cost it less than the $100,000 or so avoiding that downtime would cost in the first year, especially since the loss in good will of a week's outage would be limited if your outage is created by an event that dominates the TV news for a week or more.

Another lesson from Sandy is that backup generators just aren't as reliable as we'd like them to believe. Several hospitals in New York and New Jersey had multiple generator failures. Some colocation centers in lower Manhattan, including the one hosting the Huffington Post's servers, discovered that while the generators were safe from the rising waters, the fuel tanks in the basement flooded, shutting down the generators.

Yes, create a DR plan, and make sure your data is offsite so your whole business doesn't end when the data center is wiped out. But also recognize that from a dollars-and-sense point of view, maintaining 100% uptime through every emergency just may not be worth it.

About the Author(s)

Howard Marks

Network Computing Blogger

Howard Marks</strong>&nbsp;is founder and chief scientist at Deepstorage LLC, a storage consultancy and independent test lab based in Santa Fe, N.M. and concentrating on storage and data center networking. In more than 25 years of consulting, Marks has designed and implemented storage systems, networks, management systems and Internet strategies at organizations including American Express, J.P. Morgan, Borden Foods, U.S. Tobacco, BBDO Worldwide, Foxwoods Resort Casino and the State University of New York at Purchase. The testing at DeepStorage Labs is informed by that real world experience.</p><p>He has been a frequent contributor to <em>Network Computing</em>&nbsp;and&nbsp;<em>InformationWeek</em>&nbsp;since 1999 and a speaker at industry conferences including Comnet, PC Expo, Interop and Microsoft's TechEd since 1990. He is the author of&nbsp;<em>Networking Windows</em>&nbsp;and co-author of&nbsp;<em>Windows NT Unleashed</em>&nbsp;(Sams).</p><p>He is co-host, with Ray Lucchesi of the monthly Greybeards on Storage podcast where the voices of experience discuss the latest issues in the storage world with industry leaders.&nbsp; You can find the podcast at: http://www.deepstorage.net/NEW/GBoS

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox
More Insights