Network monitoring is not a part-time gig. Too often, however, we can only give it part-time attention. We pick a network management system (NMS), do a brief evaluation and get it into production as quickly as possible. Often, we’re driven by the IT crisis du jour and we need to put out a specific fire Right Now. In those cases, we may get from research to production in the same day. Heroic? Yes. Smart-as-hell? You bet. Creating yet another problem later on? Unfortunately, quite possible.
Cowboys and Decepticons
I’ve met a few admins (and a few more IT managers) who view DevOps as the Michael Bay-approach to engineering: flashy tech, adventure, explosions, and the potential for network Armageddon. But the reality of mature DevOps, particularly when applied to network monitoring and management, is in fact, quite the opposite.
DevOps is not cowboy engineering where everyone on the network team feels free to update core router configs whenever they see fit. DevOps is also not tinkering with executive-wing telepresence QoS maps during the business day. Instead, DevOps brings proven techniques to IT, including development, quality assurance and release management -- the same techniques that were used to create the systems we manage in the first place.
The Protestant Agilist
The trick to successful DevOps, especially when it comes to network and systems monitoring, is to be selective about which techniques you commit to employ when balancing dev, QA and ops. As admins, we have the operations component down cold. We not only can theoretically do it in our sleep, more than once we’ve looked at late-night ticket notes and realized we actually have. Development (programming, automation and deployment) we probably get, but version control, team collaboration and code stewardship aren’t particularly interesting. And formal QA? Yuck. That’s usually somebody else’s deal. If it passes the bench test, it’s perfect. Upload.
But think about how you use your NMS. When a new species or hopefully genus of IT technology arrives in your lab, how long does it take before you have the same deep charts, graphs and reports you have for a trusty CAT? Worse, how often do delays in monitoring configuration delay data center deployment of new gear? Unfortunately, it’s more than we like to admit.
One approach is to adopt limited Agile methods and stick to them. First, of course, you’re going to need a dev instance, including an extra license for your NMS. They’re often available at no cost or hugely discounted for lab use. So, step one: Call your vendor and get one.
Second, you can adopt the central tenants of Agile without joining a cult. Consider the base principle of scrum that customers change their mind during development. As admins, we know that better than anyone; customers never seem to know what they want, before, during or after the project. We don’t need a backlog confessional for not holding daily scrum meetings, but the idea of semi-regular sprints aligns well with the pace of many of our regular tasks. Better, sprints can be a great way to collaborate with customers about major projects and break large changes into manageable tasks that can be planned and managed along with our regular fire hose from the ticket queue.
How often would you experiment with system optimization if you weren’t worried about blowing up production? How could you transform your network if you could discover and implement more features of the systems you’re already paying for? What if you could provide regular reports to management demonstrating both regular progress on large projects and ongoing, safely executed production-time changes? How much more usefulness would you get if you could...play?
Devs at play
And that’s the point of DevOps: play. You’re creating a sandbox with repeatable processes coupled with rapid deployment and safety. You can refocus your efforts to quickly address challenges and at the same time channel your creative energy to do what engineers do best, which is solve problems. A DevOps approach to network and systems brings real flexibility to experiment and defines processes to safely iterate and improve even the most critical core monitoring systems. We may even become that most valued IT resource -- not a fixer, but a solver.