5 Enterprise Risks that Can be Solved by a Service Mesh
Service meshes have emerged as a widely-used component of the cloud-native stack because they add critical features around visibility, reliability, and security in a way that minimizes developer involvement.
August 5, 2021
The adoption of cloud technologies has inherently changed how applications are built. Organizations that have embraced this change have had to dispense with outdated developer methodologies and embrace new, cloud-native patterns like microservices, Kubernetes, and the use of a service mesh.
Service meshes, in particular, have emerged as a widely-used component of the cloud-native stack because they add critical features around visibility, reliability, and security in a way that minimizes developer involvement. Unfortunately, service meshes get a bad rap for being overly complex and resource-intensive, when in fact, it doesn't have to be that way (more to that below).
Let’s look at five enterprise risks that can be solved by service mesh.
1. Wasted developer time
The demands placed on modern software are extraordinary. We expect our software to be always available, to scale to arbitrary levels of traffic, and to be secure while running in an environment (the cloud) that isn't even under our control. Solving these challenges can place undue pressure on application developers—instead of building business logic, developers must waste time implementing critical platform features like retries, timeouts, TLS, request balancing, etc. These features are not just independent of business logic. They are incredibly difficult to get right in a large distributed system.
A service mesh frees your developers from these tasks. It delivers these critical platform features (like mutual TLS, latency-aware load balancing, retries, success rate instrumentation, transparent traffic shifting, and more) directly into the platform, independent of developers. This frees developers up to focus on their job—the business logic that drives the economic engine of the business.
2. Unsecured data
When organizations move application development to the cloud, they give up direct control over the baseline infrastructure: the hardware on which their application runs and the network over which their traffic transits. In exchange for the advantages of the cloud, they pay the price not just in dollars but in a loss of control. One consequence of this change is that the security boundary shifts entirely from physical data-center security to application security, including ensuring that all data is encrypted both at rest and in transit.
Here again, the service mesh plays a critical role. A service mesh can address encryption in transit with sophisticated techniques like mutual TLS, which ensure both confidentiality (encryption) and authenticity (identity validation) of both sides of the connection for all traffic within an application. Some service meshes like Linkerd can do this in a totally transparent manner without the application needing to change at all.
3. Lack of visibility
In the cloud-native world, an application might comprise hundreds of services, each with thousands of instances that are destroyed and recreated as machines come and go, new code is rolled out, services are scaled, and so on. Understanding the state of an application in this ever-changing environment — what's running successfully and what isn't — is harder than ever before.
In this complex environment, a service mesh provides a clear solution to visibility. By instrumenting all communication to and from instances and aggregating this telemetry at the level of services—the logical building blocks of the application—the service mesh provides unparalleled visibility into what's happening to the application in a way that developers and platform owners alike can easily understand.
4. Business disruption
Technologies like Kubernetes handle one part of the challenge of running reliable, scalable, and secure software, but they don't handle it all. In a distributed system, IT outages that start as partial failures in one area can quickly escalate into major operational disruptions that impact the customer experience. Kubernetes won't help.
A service mesh delivers a sophisticated set of distributed systems reliability features that can help prevent escalation in the first place, including request-level load balancing, timeouts, retries, rate limiting, circuit breaking, and traffic shifting. Some service meshes like Linkerd even provide powerful features like latency-based load balancing (using EWMA, or exponentially-weighted moving average) and retry budgets to automatically tamp down on partial failures before they escalate.
5. Being at a competitive disadvantage
Time spent on building core platform features such as mutual TLS, request load balancing, and so on is time that is not spent on building the critical business logic of your application. Time spent on identifying and resolving incidents in your cloud-native application is time not spent on building the differentiators that will make your business successful.
The successful leader knows where to spend time and energy and where to innovate. By providing critical features "out of the box", a service mesh can free up the engineering team to focus on its time, energy, and innovation on the fundamental application that powers the business, ensuring it does not end up at a competitive disadvantage with its rivals.
Not all service meshes are created equal
While the benefits of a service mesh are clear and compelling, the space is notorious for its complexity. Some service mesh projects are hard to manage and require a team of service mesh experts to operate, defeating the entire purpose of adopting a service mesh!
In this morass of complexity, one project stands out: Linkerd, the CNCF service mesh. With its open governance, laser focus on simplicity, performance, and ease of use, and fast-growing community of passionate adopters, Linkerd is increasingly the tool of choice for service mesh adopters. Whether it's Entain, a leading global sports betting and gaming operator, who got Linkerd up and running in production in five to six hours, Elkjøp, the largest electronics retailer in the Nordics who was able to identify a problem that was jeopardizing the successful rollout of their new PoS system in Denmark before Black Friday; or GoSpotCheck and YouMail, who share how they moved from Istio to Linkerd—the future of Linkerd is, as they say, so bright you got to wear shades.
Jason Morgan is Technical Evangelist for Linkerd at Buoyant.
Related Network Computing articles:
What’s the Best Service Mesh Proxy?
About the Author
You May Also Like