Defining Rules
Fortunately, some aspects of work flow aren't quite as interpersonal as team building or navigating political waters. There are definable and quantifiable causes of change: configuration changes, automated resource allocation (such as database growth) and equipment failure. Of those, only the last is usually obvious; you must keep tabs on the first two. Offering a game plan for keeping track of resource allocation is beyond our scope here, but we can say that a lot of network downtime can be avoided by getting a handle on configuration changes.
Without good change management, your department will be in trouble politically. Your changes likely will need reversing on a regular basis, and your organization will experience downtime and unreliable operations. In a nutshell, change management, while it sounds esoteric, is simply the art of managing change rather than letting it manage you. Or, as the change-management saying has it, "To fail to plan is to plan to fail."
Fortunately, change management is easy to get a handle on. It consists of several phases, starting with the planning process and continuing into the implementation:
>> Goal articulation
>> Risk-benefit analysis
>> Lab testing or proof of concept
>> Change scheduling and notification
>> Incremental rollout
>> Rollback plan
Goal articulation can be as simple as a work order issued from the helpdesk. (If you don't have some sort of work-tracking system, or you aren't happy with yours, see "Helpdesk Salvation".) When your IT department is being proactive and a business partner, the work orders likely will be generated internally and externally. If you're interested in guidelines for handling larger projects, see the "School of Project-Management Wizardry" and our Interactive Buyer's Guide on project-management software. Whether the project is large or small, make sure you get business players to sign off on the articulated benefit, as well as the perceived risks.
In fact, one of the most important aspects of change management is stated in terms borrowed from the surgical world: the risk-benefit ratio. This maxim states that before surgery is planned, patient gains must be weighed in relation to the amount of risk to the patient. In the IT world, benefits are sometimes stated: "better, faster, more," which is a sure sign that the business case hasn't been made adequately.
A better description always includes a tangible business benefit, such as "inventory spoilage will decrease 20 percent to 30 percent, which over the course of six months will save X dollars." In a nutshell, don't change anything without first articulating the business benefit or goal.
Risk is hard to quantify if you don't categorize it. Here are some risk factors to consider (see "Calculating Risk," above):
>> Scope: How far-reaching is the change to the system? The broader the scope, the riskier the project. Consider the scope from an organizational standpoint as well as a technology one.
>> Distribution: Is this a centralized change (involving one device, for example), or is it a change that gets replicated to many devices? While it is easier to roll back a change to a centralized system, it's riskier than modifying a decentralized system; if one component of a decentralized system fails, other parts of the system still function.
>> Maturity: How widely used and inspected is the proposed system? A system becomes more perfect as it is repeatedly studied, criticized and improved. (The cryptography world, where no cipher is trusted until it has been pounded on for several years, provides a good example.) Hidden systems tend to have more hidden flaws and thus are riskier to use; well-used and inspected systems are less risky.
>> Reversibility: How difficult is it to undo the change? Some system upgrades are easy to undo; these are the least risky. Others are a one-way trip; clearly, these are the riskiest.
>> Interactivity/Dependency: How much does the system interact with other network components? Upgrading a word processor would have a very low score here, and thus, low risk. Upgrading a Microsoft Windows 2000 domain controller or the firmware for an Ethernet switch would have a higher score, and thus higher risk. Good documentation of system dependencies is essential.
Risk factors can be assessed more accurately during lab testing. For example, one way to determine true reversibility is to enact the change in a lab environment -- a small reproduction of the large network that includes most aspects of the system before the change. Interactivity can be tested in this way too. Obviously, labs are expensive to build and maintain, but that's nothing compared with the costs of downtime, having to reverse an upgrade, and having to retrench and reimplement a change. If the change involves new equipment, you'd want to build it and test it before delivery anyway.