Guarding users and organizations against productivity-draining conditions has always been a full-time job for ITOps. It’s just that in a digital workplace setting, the stakes are much higher.
It requires both short-term (reactive) ability and longer-term (preventative and proactive) capability.
Establishing and maintaining a constant gaze on the digitally delivered estate is crucial to finding and fixing what might go wrong in the short-term—as a break-fix response to an outage—or in the longer term, where continuous improvement or optimization work may be used to address capacity bottlenecks that constrain users or otherwise cause performance degradations.
Users depend on code, clouds, and connectivity their employers don’t own. There is limited out-of-the-box visibility into how these are assembled, particularly any third-party dependencies or interdependencies with other services or microservices.
That end-to-end chain of technologies used to deliver a unified digital experience won't always play nicely together. In a world driven by hybrid or cloud-native deployments, where application components have become smaller, more distributed, shorter-lived, and increasingly ephemeral, a small change has a large blast radius. Timely diagnosis is critical if user experience and productivity are to be maintained.
For that reason, a goal of many ITOps teams is to move towards more of an end-to-end model, where traffic is instrumented, and teams are better placed to identify performance degradation no matter where they occur.
Whether seeking to address immediate or longer-term digital experience issues, data - and the insight it contains - can be used to drive decision-making. But the data has to be “big” as well as “good.”
- Without good data, any attempts to produce granular assessments and actionable recommendations will fall short.
- It also takes an enormous amount of data to forecast the beginnings of a degradation or performance deterioration with a high degree of accuracy. Likewise, driving innovation also requires access to the right telemetry, and lots of it.
The volume of data exists, and has for some time, so that isn’t the problem. Instead, the current challenge is twofold: the data often isn’t as comprehensive as it needs to be, nor is it in a format where it can be easily collected and ingested into a single view that makes correlation simple and actionable.
But this is changing. Recent technology capability improvements, including increased support of the Cloud Native Computing Foundation (CNCF)-backed OpenTelemetry format, promise to reduce barriers by putting ITOps teams more in control of the digital experience they are assigned to provision, monitor, and uphold.
Bringing network visibility data into OpenTelemetry
OpenTelemetry is "a vendor-neutral open-source observability framework for instrumenting, generating, collecting, and exporting telemetry data such as traces, metrics, and logs" from cloud-native applications and infrastructure.
OpenTelemetry, to date, has been very application-centric. It assumed the performance or experience bottleneck, and greatest opportunities for improvement, lay within the application itself. That has been true for years, but the rise of modular and distributed applications changes the type and nature of things that can go wrong. Problems with external network paths or dependent services can also be common causes of a degraded user experience.
Network visibility data has long been a useful augmentation to create a more complete picture of user experience delivery. This augmentation is now possible in OpenTelemetry-instrumented environments. ITOps teams can export data northbound from the ThousandEyes platform in an OpenTelemetry format, making it possible to combine cloud and Internet intelligence with other OpenTelemetry-compliant datasets and analyze the data all in one place. This is a first for the OpenTelemetry ecosystem and an endorsement of the value that network data has for the digital experience today.
The intent is to enable ITOps teams to create end-to-end correlated insights across technical domains so that optimal digital experiences can continue to be maintained.
How OpenTelemetry simplifies ITOps
For ITOps teams, the data portability and standardization made possible by OpenTelemetry brings with it several benefits, perhaps most notably the promise to simplify their work.
ITOps teams are used to dealing with data in non-standardized, proprietary formats and across multiple systems, with bespoke connectors built between repositories to aid correlation activity.
Because monitoring of the various parts of the end-to-end delivery chain varies, communications silos exist.
For example, it may be challenging for an application owner to relate what the network team is saying to telemetry collected in their own domain.
The effect of that may be felt particularly acutely in a war room scenario set up to deal with a major incident. In these situations, it’s important for cross-functional representatives to come together and share information. These processes are more effective when everyone speaks a common language.
In addition, with a standardized format, data can be exported into a familiar workflow and tool of choice. That should speed up the time-to-decision and time-to-action in an incident response scenario and the time-to-value in a continuous improvement or optimization scenario.
It should also allow ITOps teams to pinpoint which course of action will lead to the bigger improvement.
Mike Hicks is Principal Solutions Analyst at Cisco ThousandEyes.