Presentation: Lyft's Envoy: Embracing a Service Mesh
Share this on:
What You’ll Learn
-
Hear from the creator of Envoy the sorts of problems Lyft was facing that ultimately led to the creation of Envoy.
-
Understand how Lyft used Envoy to be able to focus on producing more business logic oriented code and less infrastructure oriented code.
-
Learn more about Envoy and why so many companies are making it part of their infrastructure when deploying Microservices.
Abstract
Over the past several years, facing considerable operational difficulties with its initial microservice deployment primarily rooted in networking and observability, Lyft migrated to a sophisticated service mesh powered by Envoy (https://www.envoyproxy.io/), a high-performance distributed proxy that aims to make the network transparent to applications. Envoy’s out-of-process architecture allows it to be used alongside any language or runtime.
At its core, Envoy is an L4 proxy with a pluggable filter chain model. It includes a full HTTP stack with a parallel pluggable L7 filter chain. This programming model allows Envoy to be used for a variety of different scenarios, including HTTP/2 gRPC, MongoDB. Redis, rate limiting, etc. Envoy provides advanced load balancing support, including eventually consistent service discovery, circuit breakers, retries, and zone-aware load balancing. Envoy also has best-in-class observability, using statistics, logging, and distributed tracing.
Matt Klein explains why Lyft developed Envoy, focusing primarily on the operational agility that the burgeoning service mesh paradigm provides, with a particular focus on microservice networking observability.
Interview
QCon: You created Envoy. How did you come up with the idea for Envoy?
Matt: I've been working on Internet-scale networking for the last 10 years at places like Amazon, Twitter, and Lyft.
The migration of technology stacks from a single language stack to a more polyglot stack over the last five to seven years has made it clear that people are embracing more Microservices architectures. Embracing a stack that has many different languages brings with it a lot of different problems. For example, you have hugely heterogeneous environments across different types of architectures and even across different on-prem and cloud providers. We realized that networking and unobservable behavior are quickly becoming the largest impediments to scale. These are things like advanced load balancing, timeouts, retries, circuit breakers, tracing, and logging.
Looking around the ecosystem you see a lot of great tooling around the JVM (things like Finagle or Hystrix from Netflix). But when you start looking in the polyglot environment, there really did not exist any cohesive set of technologies that allow people to deploy distributed system best practices (particularly, across networking and observability).
So when I came into Lyft, the company had a monolith environment with mostly PHP. They had some services in Python and were looking to add more services in Go. We were facing a lot of the same problems any company would face around this type of architecture. When it came to choosing between solving these problems with yet another library, it became clear that if we could solve these problems with an out of process proxy that was extensible, high performance, had best of class load balancing, and observability, it would be something compelling and help improve Lyft's architecture. In addition, if you could use the same proxy for internal services and for traffic at the edge, that's a pretty great thing from an operational perspective. So we felt there was a really great opportunity to help Lyft scale. That solution became Envoy.
QCon: What's the focus of the QConNYC talk?
Matt: What we're going to do is dig deep into what Lyft's problems were prior to Envoy existing. So we'll try to set the stage for why Lyft was rolling out a microservice architecture. I'll discuss what we were hoping to gain from it. What were the operational problems that we were actually having, and then we're going to dig into the main design points of Envoy and how it helped fix those problems. We'll probably spend a considerable amount of time actually talking about the operational aspects of those problems. I'll show a lot of the internal dashboarding that we use. I'll talk a little bit about the alarming, the tracing, the logging, and try to give people a good understanding of how (from an operational perspective) Envoy and the service mesh actually help people scale their Microservices architectures.
QCon: Why do you think this is an important story today?
Matt: think deploying microservice architectures is obviously all the rage right now. I think that there are very good reasons for organizations to do that, but I think, at the same time, the current state of the industry is such that organizations undertake microservice migrations without fully understanding all the operational complexity.
I think many organizations get stuck, and I think Lyft was in that position. We wanted to unlock the people agility around microservices but faced major operational concerns particularly around networking and observability. That's where envoy comes in and helps bridge that gap. How do you allow people to come in and build microservice architectures and scale them in such a way that they don't spend all their time debugging?
QCon: Who are you talking to or are you talking to in this talk?
Matt: First off, I think I'm talking to two different types of people. The first are people who are building infrastructure. So people who are building the foundational systems that the application developers are going to run their business logic on. The second set of people are application developers. I think a lot of application developers spend a lot of time dealing with infrastructure problems and not focusing on business logic. For that audience, my goal is to try to help them understand that there is a better way. If the infrastructure is mature enough (and provides enough abstractions), they can spend more time focusing on business logic than on dealing with debugging random problems.
Similar Talks
Tracks
-
Microservices: Patterns & Practices
Evolving, observing, persisting, and building modern microservices
-
Developer Experience: Level up Your Engineering Effectiveness
Improving the end to end developer experience - design, dev, test, deploy, operate/understand. Tools, techniques, and trends.
-
Modern Java Reloaded
Modern, Modular, fast, and effective Java. Pushing the boundaries of JDK 9 and beyond.
-
Modern User Interfaces: Screens and Beyond
Zero UI, voice, mobile: Interfaces pushing the boundary of what we consider to be the interface
-
Practical Machine Learning
Applied machine learning lessons for SWEs, including tech around TensorFlow, TPUs, Keras, Caffe, & more
-
Ethics in Computing
Inclusive technology, Ethics and politics of technology. Considering bias. Societal relationship with tech. Also the privacy problems we have today (e.g., GDPR, right to be forgotten)
-
Architectures You've Always Wondered About
Next-gen architectures from the most admired companies in software, such as Netflix, Google, Facebook, Twitter, Goldman Sachs
-
Modern CS in the Real World
Thoughts pushing software forward, including consensus, CRDT's, formal methods, & probalistic programming
-
Container and Orchestration Platforms in Action
Runtime containers, libraries, and services that power microservices
-
Finding the Serverless Sweetspot
Stories about the pains and gains from migrating to Serverless.
-
Chaos, Complexity, and Resilience
Lessons building resilient systems and the war stories that drove their adoption
-
Real World Security
Practical lessons building, maintaining, and deploying secure systems
-
Blockchain Enabled
Exploring Smart contracts, oracles, sidechains, and what can/cannot be done with blockchain today.
-
21st Century Languages
Lessons learned from languages like Rust, Go-lang, Swift, Kotlin, and more.
-
Empowered Teams
Safely running inclusive teams that are autonomous and self-correcting