Service providers and enterprises have become comfortable utilizing a litany of Day 0 and Day 1 solutions for deploying, provisioning, and configuring networks, but they often ignore the much bigger and more costly challenge associated with ongoing network operations for the lifespan of the equipment - commonly referred to as "Day 2" operations. Today, Day 2 operations consist of a set of engineer-led, uniquely cumbersome, and time-consuming manual processes fraught with inconsistency and inefficiencies, which results in a demonstrable increased risk of outages and service degradations. Day 2 operation of the infrastructure is ongoing, so focusing on operational efficiency means huge savings in hard and soft costs.
As networks have gotten more complex, including public clouds and software-defined technologies, successful Day 2 network management at the enterprise level requires a more strategic approach to operations that looks at the delivered results or "intents" of the network rather than merely the health of individual devices. In addition, while every operational team may view problem remediation as a constantly changing and bespoke set of tasks with thousands of network issues reported monthly, the reality is all these service requests consist of a relatively small number of issue types that occur over and over again. This opens the door to operational efficiency and the savings associated with it through repetitive problem solving at scale. The ability to share the knowledge of skilled operators across IT operations teams leads to more scalable management and troubleshooting processes critical to the organization's overall success.
Why is Ongoing Network Management Difficult?
In short, there are two reasons:
- The focus has been on managing device health instead of business outcomes.
- The lack of realization that all the thousands of reported network issues that occur fall into a relatively small number of ‘similar’ issues that occur repetitively.
For decades they have focused on the device health of the network and made the misguided assumption that if the devices are healthy, then the network is working. This ignores each application's unique network requirements, preventing service delivery goals from being met. In most cases, problems are reported due to configuration conflicts rather than devices that have failed.
In addition, the average global enterprise has hundreds of core applications supported by thousands of multi-vendor network and cloud-connected devices and services with thousands of configuration changes annually across all of them. This generates thousands of trouble tickets per month, each still being individually addressed with manual processes and too often dependent upon a few skilled engineers who solve problems uniquely. And the widespread use of SD-WAN and public and private clouds has made these networks even more complicated and stymied the traditional brute force approach to infrastructure management.
Network Management is Decades out of Date
In an ironic twist, while the underlying network technology at use in service provider and enterprise infrastructures has dramatically advanced over the years, the ongoing management of these networks has not. Network management is still largely a manual, tactical process dependent on the specific operator or engineer's own personal knowledge and experience, using a mix of vendor-specific and homegrown tools, command-line interfaces, and one-off scripts. IT executives have always addressed the challenge of scaling operations by throwing more operators and engineers at the problem. This has provided some relief historically, but with mounting global economic pressures as well as the changing workforce demographics, it is no longer a valid strategy and the continued reliance on it wreaks havoc in the cost structures of delivering IT services, the foundation of every business.
The Covid-19 pandemic both made experienced IT staff harder to find and more costly and provided the opportunity to move more budget into network automation: a 2021 Enterprise Management Associates report found that 91% of IT organizations said business conditions during the pandemic led to a permanent increase in network automation investment. Day 2 Network operations is focused on managing and optimizing networks continuously and can greatly benefit from automation and scalable and repeatable processes to equip the NetOps team in day-to-day network management with a non-linear increase in skilled staff.
A Strategic Approach to Day 2 Network Operations
Day 2 operations must become smarter and less labor-intensive. Rather than managing specific trouble tickets as if they were unique or focusing on maintaining the health of specific devices, strategic and highly effective Day 2 Network Management is centered around maintaining the overall business and application goals, or intents, of the network since it is the application and business services that define actual success, not the health of devices. For example, applications like Voice-over-IP have specific needs of the network that must be met for calls to sound good. Modernizing Day 2 network operations is a fairly simple process once the desire to do so has been established. First, a foundational digital twin of the network can be implemented with auto-discovery and traffic flow in real time. Second, the set of business applications can be identified with the network requirements of each articulated and captured as a required intent. Automation can then be applied to this model to maintain the long list of business intents.
The result of this new approach to Day 2 network operations is greater operational efficiency, reduced costs and risk, reduced Mean Time to Recovery, fewer outages, less downtime, and a reduction in needed headcount as the organization grows and expands.
The old way of manual, brute force network management simply can’t keep up with the pace of change in modern networks and the macro changes that have occurred. Now is the time to take a deep hard look at how your operational plan is being executed today and start looking into how automation can be deployed within your operational best practices to reduce the drag. Those organizations that take no action will be faced with rising costs and higher service delivery business risk.
Song Pang is the SVP of Engineering at NetBrain Technologies.