Lyft
Presentations about Lyft
Lyft's Envoy: Embracing a Service Mesh
Large Scale Architectures Panel
Large Scale Architectures Panel
Large Scale Architectures Panel
Large Scale Architectures Panel
Large Scale Architectures Panel
Large Scale Architectures Panel
Large Scale Architectures Panel
Interviews
Lyft's Envoy: Embracing a Service Mesh
QCon: You created Envoy. How did you come up with the idea for Envoy?
Matt: I've been working on Internet-scale networking for the last 10 years at places like Amazon, Twitter, and Lyft. The migration of technology stacks from a single language stack to a more polyglot stack over the last five to seven years has made it clear that people are embracing more Microservices architectures. Embracing a stack that has many different languages brings with it a lot of different problems. For example, you have hugely heterogeneous environments across different types of architectures and even across different on-prem and cloud providers. We realized that networking and unobservable behavior are quickly becoming the largest impediments to scale. These are things like advanced load balancing, timeouts, retries, circuit breakers, tracing, and logging. Looking around the ecosystem you see a lot of great tooling around the JVM (things like Finagle or Hystrix from Netflix). But when you start looking in the polyglot environment, there really did not exist any cohesive set of technologies that allow people to deploy distributed system best practices (particularly, across networking and observability). So when I came into Lyft, the company had a monolith environment with mostly PHP. They had some services in Python and were looking to add more services in Go. We were facing a lot of the same problems any company would face around this type of architecture. When it came to choosing between solving these problems with yet another library, it became clear that if we could solve these problems with an out of process proxy that was extensible, high performance, had best of class load balancing, and observability, it would be something compelling and help improve Lyft's architecture. In addition, if you could use the same proxy for internal services and for traffic at the edge, that's a pretty great thing from an operational perspective. So we felt there was a really great opportunity to help Lyft scale. That solution became Envoy.
QCon: What's the focus of the QConNYC talk?
Matt: What we're going to do is dig deep into what Lyft's problems were prior to Envoy existing. So we'll try to set the stage for why Lyft was rolling out a microservice architecture. I'll discuss what we were hoping to gain from it. What were the operational problems that we were actually having, and then we're going to dig into the main design points of Envoy and how it helped fix those problems. We'll probably spend a considerable amount of time actually talking about the operational aspects of those problems. I'll show a lot of the internal dashboarding that we use. I'll talk a little bit about the alarming, the tracing, the logging, and try to give people a good understanding of how (from an operational perspective) Envoy and the service mesh actually help people scale their Microservices architectures.