The other day I started my day by trying to catch up on email but I couldn't connect to my company's Exchange server. I tried OWA but couldn't reach the web server. So I sparked up Traceroute and saw that Traceroute was timing out. I called my support people so see if it was just me or everyone, but they had no trouble tickets. I called Verizon tech support and they were no help. A few hours later, things got back to normal, but as a teleworker I was cut off.
The connectivity issue was intermittent over a few hours. I could reach some web sites like www.yahoo.com but not mail.google.com (Gmail). The Internet wasn't self-healing on this occasion. Since it has been 10 or more years since I faced a connectivity issues of any kind due to Internet problems, I had no idea what the paths normally look like. But I did know the following:
- Network paths—router hops—do change, but I have found that route paths are fairly stable over long periods of time.
- Traceroute, while useful for mapping network paths, is not authoritative. Some routers don't respond to ICMP echo requests and show up as time outs. That could mean the packets are lost or the router simply doesn't respond.
- I can't solve this on my own.
My call to Verizon FiOS tech-support was an exercise in patience. The support person wanted to walk me through her check list starting with "Is my network plugged in?". I ended that call politely, but quickly. I knew my network and network connection was fine. The problem was out there, but I wasn't going to make headway through Verizon's support channels. Luckily, my company has two VPN gateways on different coasts, so I could get to Exchange through our other VPN gateway. A few hours later, the Internet connectivity was restored and all was well.
However, I lost a bit of faith in the reliability of the Internet to deliver robust connectivity. Where is the much talked about routing around problems everyone is quick to point out? I was cut off for several hours and I didn't have a reason why or know when service would be restored.
For the record, I recognize that my lost faith is unreasonable. The network of networks works, works well, and is much more reliable today than it was 7 years ago. But if my company didn't have an alternative VPN gateway for me to use, or if that network path was degraded or down, I wouldn't have been able to get any connection. Dead in the water. Sure, I could and did get some work done in the meanwhile, but I was incommunicado. The failure is just a reminder that failures do happen and a back-up plan is necessary.
The impact on a remote office is much worse because more people are affected and the longer network connectivity is lost, the worse that impact becomes. There are a two things that you can do if you are relying on the Internet to interconnect your offices, particularly if the offices are on different service providers.
- Set up a way to test if your remote sites are up. Even a simple ping that alerts someone on failure might be good enough. There are plenty of open source packages like nagios or free services like Spiceworks that will let you do that.
- Invest in out of band connectivity like ISDN or even a pair of 56 Kb modems bonded together. After you stop laughing, consider what traffic is absolutely critical to remote offices and determine of you can limp along on 128Kb connection. You will be surprised what you can fit into that limited bandwidth if you are choosy about what you send over the back-up connection.
Consider having an alternate VPN site on a different network on a different geographic location so that if a temporary failure in one part of the Internet cuts office off, they can connect to an alternate location. These are pretty low cost ideas compared to the cost of downtime.