IT administrators are creatures of habit, which is why they prefer their networks to be as predictable as possible. But lots of unexpected things can happen: applications slowing to a crawl, spikes in bandwidth use, declining VoIP quality, customer complaints about slow websites, or unexplained database failures. These sorts of network incidents anomalies can hurt productivity or indicate a dangerous security breach.
The good news is organizations do not need to stand by idly and wait for the unthinkable to happen. IT leaders are increasingly empowering their IT teams to be proactive in network management and take steps to head off potential problems.
This approach allows IT to combat the issues typically lurking in the shadows and deal with them on their own terms, rather than in a worst-case scenario. Being reactive is not enough; heading off issues such as filled-up disks, slow backups, and inadequate central processing unit (CPU) horsepower or RAM is a matter of planning and maintenance.
Develop a tool set
As part of this proactive approach, IT professionals should develop their favorite tool sets to address the potential problems that most concern them. Is VoIP quality a concern? Or is runaway bandwidth usage by rogue users a bigger concern? For these sorts of problems, a variety of tools exists, some of which are free. Port scanners, IP address trackers, syslog servers, visual trace-routes, virtual machine monitors, network content analyzers, and Active Directory administration tools are just a few.
Make sure you understand subtle differences between tools from different vendors. For example, one bandwidth monitor may be better at handling real-time data, while another does a better job with historical trending and reporting. Above all, get familiar with the tools before a problem happens; a crisis is not the time to open the wrapper.
Combat the unseen with the seen
The best way to deal with a network infrastructure anomaly is to be able to recognize it as such and quantify it. IT professionals must establish baselines of what is normal for their business environment. For example, it is helpful to know that the CPU spikes for a server every day at 1:00 a.m. because the backup kicks in, but it's a harmless, short-lived, and recurring event.
When IT teams are familiar with their baseline data and understand what to expect within their organization, they can react quicker to unusual events and disruptions.
Make checklists and test
IT professionals often make changes where, having done something many times before, they just expect the standard results to follow. However, IT teams work in an environment where much is out of their control, and they should always anticipate the unexpected. IT pros can benefit from making checklists to stay on top of priorities and keep systems running smoothly.
Power failure, as an example, requires a particular sequence of events to return to full functionality fast. A checklist would tell you the sequence in which to power servers and network devices back up, the processes that need to be restarted manually, and perhaps the staging table for reports that need to be re-initiated.
After years of experience, veteran IT pros may think they do not need to follow the same steps, but they are the most likely to let something slip if they aren't monitoring changes and adjusting for new technologies. Atul Gawande's The Checklist Manifesto is a great and sobering read for IT teams on "how to get things right."
IT should backup before making changes, where possible, and test changes made to the system regardless of how tedious it is. It's better to run a test and identify any issues before they blow up later.
When IT professionals follow these steps to take control of their environments and prevent problems from rising to the surface, they will feel more empowered and confident in their capabilities. IT leaders can encourage this in their employees by trusting them with more complex tasks, spending money on the right tools, and simultaneously providing leeway to make mistakes.