Most enterprises have already begun shifting their strategies to align with their current environment, which leverages the value of data more than that of physical assets, and sees all industries increasingly powered by and reliant on digital information technology and processes. The fact that businesses have only begun to adapt is testament to the fluid and evolving nature of digital transformation. Multiple IT technologies, processes, applications, systems, and protocols need to be adopted and updated on a regular basis in order for businesses to keep abreast of changes. This does, of course, result in significant disruption for all involved.
The pace of digital service development--fueled by increasing automation--is accelerating at a far greater rate than the IT operations team used to handle in the past. This acceleration creates more chaos in production environments. Furthermore, the shift in the DevOps paradigm from delivering “failsafe” applications to expecting a “safe-to-fail” production environment increases this chaos even further. Close collaboration and communication between the IT team members responsible for service development and delivery is therefore vital in controlling, or at least minimizing, the resulting chaos.
The DevOps chaos theory
In a business environment where the continuous delivery pipeline is spurred on by automation, and both the development velocity and enterprise scale are increasing, DevOps principles will prove more important than ever. It is useful to look at this at a more granular level, through the framework of what I call the DevOps Chaos Theory.
In my formula, the pace of innovation is measured as the Velocity (V) or the number of new software releases deployed in a production environment in a defined time period. The Scale (S) factor is measured as the overall number of IT staff involved in service delivery and management in production environments, such as DevOps, SecOps, QA, system architects, DBAs, NetOps, and help desk. Interaction between these team members brings the potential for miscommunication, which will increase the overall chaos. The maximum number of interactions between these IT members is S * (S – 1)/2 and for high-scale organizations, it approaches S2/2.
Based on these considerations, a logical hypothesis would identify the system-level Chaos (C) in production environments as C = K * V * S2. K is the normalization factor that may change based on the overall adoption of digital transformation in a specific industry and the effectiveness of collaboration and communication between the IT team members.
In such a disruptive environment, it's vital that different departments within a business work openly together. While I appreciate the important role automation tools play in the continuous delivery, they cannot eliminate the bottleneck in the pipeline; rather, they shift it down the line into production. Therefore, according to the “safe-to-fail” paradigm, most chaos will likely manifest in production environments, so it will be crucial in the coming year and beyond that enterprises identify the level of constraint placed upon the IT operations team. This will help to address what changes need to be made and what service performance management technology must be introduced, to prevent operations from becoming a bottleneck to the continuous service delivery cycle inherent to DX.
Effective management at a human level should form an important part of a company’s digital-transformation strategy, if chaos is to be mitigated and crisis averted. Operations and development teams need to collaborate to form and practice cross-company initiatives, in order to communicate and manage rapid systems changes.
An effective instrumentation and monitoring strategy is required to facilitate successful collaboration across IT teams. Since service delivery combines application and infrastructure into a single system, telemetry of key performance indicators (KPIs) of this system is critical. Monitoring system-level KPIs requires access to reliable data sources, such as network traffic. An effective instrumentation of these data sources will play a key role in proactively identifying the root-cause of service issues and thus reining in chaos.
Getting these things right and planning for next year will help maintain business velocity throughout the digital transformation process.
Michael Segal is VP Strategy at NETSCOUT. His product management experience spans across 10 years at Cisco Systems, where he managed all aspects of product line lifecycles for several successful product lines. Michael's technical areas of expertise include SaaS/cloud, virtualization, mobile IP, security, IP networking, Wi-Fi/wireless, VoIP, and remote access. Michael holds patents in areas of networking and wireless mobility.