"If you call something by a single a name, it is a single point of failure," George Reese, CTO of enStratus, has said. I think of that statement as a mnemonic for how IT should think about redundancy. No matter how well you architect for redundancy and availability, there will certainly be single points of failure (SPOF) that you can't account for.
Whether the SPOF is some feature that is overlooked, a system outside of your control or a more complex issue--like the one Amazon found when its Elastic Block Storage service melted down, causing a number of service outages. The SPOF, in all of its forms, can make application mobility via virtual desktop infrastructure (VDI), cloud services and netbooks such as Google's Chromebook less attractive computing options when compared with fat clients, fat servers and boring but reliable storage. While services can fail, let's not forget that the most frequent SPOF we deal with is the access networks at the Internet edge.
I am often shocked when a SPOF rears its ugly head. When I started writing this blog, my Verizon FiOS was down because of a fiber cut affecting Syracuse and Binghamton. [[Correction: Verizon told me on June 3rd that the fiber was part of a ring that stretched from New York city through Broome County. The other side of the ring was taken out the day before by a manhole fire and was under repair. No Verizon services were affected. Once the other side was cut on Thursday by stormy weather, Verizon and First Energy, who they leased the fiber from, had to wait to the all clear from the utility and emergency services to repair the break.]] If the outage affected just Internet access, that wouldn't be horrible, because I still have fat clients running on fat computers. But phone service was affected, too. Ironically, I couldn't call Verizon tech support on my home phone because the service was down, nor could I call 911. Granted, in the eight years I have had Verizon services and the six years with FiOS, we have experienced only one other service outage, so uptime has been good. However, a SPOF that takes out all services for a couple of hundred thousand customers [[Correction: Verizon told me about 24,000 customers were affected]], including emergency services, should never happen.
The FiOS outage had a number of other ill affects. I couldn't synchronize my critical files with cloud services like Dropbox or Microsoft's Skydrive. I wasn't able to access Google Apps, nor Microsoft's OfficeLive. I couldn't connect to any wiki that my company uses internally or externally. I was pretty much reduced to doing tasks using whatever tools and information I had at hand.
I am lucky that I have had only two outages in eight years. At least one co-worker a month announces that he or she has lost Internet access and therefore will be offline for an extended period. At least I have an Android phone and can get email and limp along with web access when needed. I couldn't have tethered my laptop to my phone since I don't pay for tethering and I am unwilling to violate Verizon's terms of service.