An "unplanned data center outage" is a polite way to say that a data center failed. Whether the root cause is a hardware failure, software bug, or human error, most failures can -- and should -- be prevented. With the high level of redundancy built into today's data center architectures, prevention is very much possible.
The interesting thing is, data center failures still happen all the time. Considering the incredible cost per minute lost during a full outage, you'd think that they would be far more rare. If data center managers simply focused on fixing the main reasons failures commonly occur, they would significantly reduce the risk of catastrophic outage.
The problem is that so many data center operators are heavily focused on growth instead of the care and feeding of what's already in place. If you watch administrators in many public and private data centers these days, you'll find that they are focused largely on increasing capacity, boosting server density, and retrofitting aging server rooms into more modern facilities with more efficient cooling systems. While all this is fantastic and shows the incredible growth in the data center industry, it also highlights why we commonly see outages.
On the following pages, we're going to get back to data center basics. We'll present 10 common reasons why data centers fail. Click through and think about how these common outages might one day surface in your data center. While not every failure scenario may match your data center architecture, we're confident that at least a few topics we mention will hit home and make you think about what you can do to shore up your facility.
And if you have any additional thoughts, tips, or stories that may help your fellow administrators avoid an outage, please share them in the comments below.
(Image: 123Net / Wikimedia Commons)