The service mesh space is crowded with offerings each purporting to have a unique value proposition. Despite this, many mesh offerings have the exact same engine under the hood: the general-purpose Envoy proxy.
Of course, there are many reasons to build on top of Envoy. First and foremost is that writing a proxy is hard—just ask the folks at Lyft, NGINX, or Traefik. To write a modern proxy, you need to both—ensure that it's highly performant and highly secure. A proxy is part of the critical path for all the applications in a service mesh and therefore critical to your applications. If a proxy introduces new security vulnerabilities or significantly degrades performance, it will have a big impact on your applications.
However, the danger with using a general-purpose proxy is that to use your mesh well your users will also need to learn how to use, configure, and troubleshoot your proxy—in short, they will need to become proxy experts as well as service mesh experts. While that cost may work for some organizations, many find that the tradeoff isn’t worth the additional feature set.
For example, browsing through Istio’s repo on GitHub you can see that, at the time of this writing, there are over 280 open issues referencing Envoy. This implies that Envoy remains an active source of friction for the Istio mesh users.
A better choice: building a service-mesh-specific data plane
Imagine for a moment that you were freed of these constraints and could create the ideal service mesh proxy. What would that look like? For starters, it would need to be small, simple, fast, and secure. Let’s dig into this a bit.
Small – Size matters: Your proxy sits beside every single application in your mesh, intercepts, and potentially transforms every call to and from your apps. The lighter weight it is and the lower its performance and compute tax, the better off you are. Heavy-weight proxies need to provide heavy-weight benefits to offset the additional cost of running a proxy for every app. If we were building our new mesh today, we’d pick the smallest possible proxy.
Simple – KISS: Or, as my drill sergeant used to say, keep it simple, buddy. (Well, he didn't actually say "buddy".) Every feature that your proxy implements is going to be offset by security, performance, and size costs. When it comes right down to it, adopting an all-purpose proxy is great because it will likely do more than your control plane needs. Unfortunately, a feature implemented in the proxy that isn’t used by the control plane is wasted and, even worse, while it helps the mesh developer, it hurts their customers by exposing them to more vulnerabilities and making them deal with more operational complexity. A perfect mesh has a proxy that only implements the features it needs and nothing more.
Fast – High speed, low drag: Any latency added to your transactions by your proxy is latency added to your application. Now, to be clear, there are a lot of ways a service mesh can make your application faster, including by optimizing what endpoints it talks to and changing how inter-app traffic is handled, but the slower the proxy, the slower the mesh. When thinking about the ideal proxy, it would be extremely fast. That means, it'd be written in a language that compiles into native code and that isn’t garbage collected (GC). Native code for the execution speed and, as useful as GC is, it will periodically slow the performance of the proxy.
Secure – First, do no harm: Anytime you add a new piece of software to your stack, you add a new avenue for vulnerabilities. A service mesh is a critical portion of your infrastructure, particularly if you rely on it to secure all inter-app communication. The proxies will have access to every piece of PII, PCI, PHI, and any other data your application processes. So, as we think about those super-fast native code languages, we need to consider the security impact they have. C and C++ are great for performance but they are vulnerable to all sorts of memory management exploits. When writing our own proxy, we’d probably want to write or adopt a proxy written in Rust. Rust gives you the speed of C and C++ with much stronger memory guarantees.
Where Does That Leave Us?
The perfect service mesh implementation wouldn’t use a general-purpose proxy, but would instead use a service mesh specific proxy—one that does no more than the mesh needs and that is written in a performant language with strong security guarantees like Rust.
The Linkerd "micro-proxy"
The Linkerd project chose to do exactly that, in the form of the Linkerd "micro proxy", designed to keep the data plane’s resource cost and vulnerability surface as small as possible. William Morgan, the author of the "meshifesto" referenced at the beginning, goes into depth about why Linkerd chose to write their own proxy. To keep complexity down, compute costs low, and create the fastest mesh on the market while minimizing the security impacts of the proxy, Linkerd wrote the linkerd2-proxy. This leaves Linkerd with the fastest, lightest weight, and easiest to use service mesh available. Unlike many other mesh offerings Linkerd only works on Kubernetes but we—and our adopters—are happy with that trade-off.
We’ve covered a lot of ground in this article and its first part—from what a service mesh is to why its proxy matters. Hopefully, you have a sense of why service meshes are compelling, and what choices and tradeoffs service mesh implementations make. Hopefully, you've also seen how, while the implementation detail of which proxy a mesh uses is important shaping the mesh itself, the proxy is just that—an implementation detail. The real value of a service mesh is in the capabilities, performance, and security outcomes it provides.
Jason Morgan is Technical Evangelist for Linkerd at Buoyant.
Read part one of this article: Service Mesh and Proxy in Cloud-Native Applications