You are viewing content from a past/completed QCon -

Track: Architectures You've Always Wondered About

Location: Broadway Ballroom North, 6th fl.

Day of week:

Have you ever wondered how well-known tech companies are able to seamlessly deliver an exceptional user experience, while supporting millions of users and billions of transactions? Behind every great site is an architecture that enables it to function, innovate and scale. Join the Architectures You’ve Always Wondered About track to hear about these next-gen architectures, patterns and anti-patterns, best practices, challenges, and interesting war stories.

Track Host: Karen Casella

Engineering Leader @Netflix, previously leading architecture teams @eBay & @Sun

Karen is currently an engineering leader @ Netflix, where she is responsible for the teams that build the server-side infrastructure to enable a secure viewing experience for our members. Karen had previously led engineering and architecture teams @EBay & @Sun, before taking a ten year detour to start-up land. Karen is passionate about diversity and inclusion in technology organizations and how to encourage young people from under-represented groups to start their journey towards our incredible world of ever-changing technology.

10:35am - 11:25am

Lyft's Envoy: Embracing a Service Mesh

Over the past several years, facing considerable operational difficulties with its initial microservice deployment primarily rooted in networking and observability, Lyft migrated to a sophisticated service mesh powered by Envoy (, a high-performance distributed proxy that aims to make the network transparent to applications. Envoy’s out-of-process architecture allows it to be used alongside any language or runtime.

At its core, Envoy is an L4 proxy with a pluggable filter chain model. It includes a full HTTP stack with a parallel pluggable L7 filter chain. This programming model allows Envoy to be used for a variety of different scenarios, including HTTP/2 gRPC, MongoDB. Redis, rate limiting, etc. Envoy provides advanced load balancing support, including eventually consistent service discovery, circuit breakers, retries, and zone-aware load balancing. Envoy also has best-in-class observability, using statistics, logging, and distributed tracing.

Matt Klein explains why Lyft developed Envoy, focusing primarily on the operational agility that the burgeoning service mesh paradigm provides, with a particular focus on microservice networking observability.


Matt Klein, Creator of Envoy & Software Engineer @Lyft

11:50am - 12:40pm

Canopy: Scalable Distributed Tracing & Analysis @Facebook

How do you understand the performance of a request that is executed in a large-scale system, potentially fanning out across thousands of machines and services? To answer this question at Facebook, we built a distributed tracing framework, Canopy, which has provided visibility into an otherwise intractable problem. 

In this talk we present Canopy, Facebook’s performance and efficiency tracing infrastructure. Canopy recoards causally related events across the end-to-end execution path of requests, including from browsers, mobile applications, and backend services. Canopy processes traces in near real-time, derives user-specified features, and outputs to datasets that aggregate across billions of requests. At Facebook, Canopy is used to query and analyze performance and efficiency data in real-time.   

Canopy addresses three challenges we have encountered: (1) supporting the range of execution and performance models used by different components of the Facebook stack; (2) supporting interactive ad-hoc and real-time analysis of trace data; and (3) operating at massive scale - Canopy currently records and processes over 1 billion traces per day.   

We conclude by discussing lessons learned applying Canopy to a wide range of use cases at Facebook and present case studies of its use in solving various performance and efficiency challenges

Haozhe Gao, Software Engineer @Facebook
Joe O’Neill, Software Engineer @Facebook

1:40pm - 2:30pm

Scaling Push Messaging for Millions of Devices @Netflix

How do you efficiently serve the latest personalized movie recommendations to millions of Netflix members, as soon as they are ready?

Netflix recently rolled out Zuul Push - a massively scalable push notification service that handles millions of "always-on" persistent connections from all those Netflix apps running out there. It proactively pushes new data - like personalized movie recommendations - from cloud to devices instead of devices having to poll the server periodically. This has helped reduce data delivery latency and cloud footprint by eliminating wasteful polling requests. It also opens up a whole new set of interesting possibilities like initiating on-demand telemetry of detailed debug data from misbehaving devices in the field.

Zuul push is a high performance async service based on Netty. It supports WebSocket and SSE protocols for push notifications. It handles more than 5.5 million connected clients at peak today and is rapidly growing. We will cover the design of the Zuul Push server and its globally replicated client registry that makes it possible for Netflix to scale to millions of concurrent persistent connections and deliver push notifications globally across the AWS regions.  We will also review the design details of the backend message routing infrastructure that lets any Netflix microservice push notifications to any connected client.

Key takeaways include:

  • How push messaging can be used to add new capabilities to your existing application.
  • How to scale to large number of persistent connections using Netty and async I/O.
  • Differences between operating this type of service versus traditional request/response style stateless REST services.

Susheel Aroskar, Software Engineer @Netflix

2:55pm - 3:45pm

Closer to the Wire: Real-time News Alerting @Bloomberg

What is it worth to be the first person to read a news story? For a trader, a second is enough to make or break a portfolio. Bloomberg clients leverage our highly customizable real-time news alerts to make informed trading decisions. But how do you build such a flexible alerting system that manages large-scale subscriptions and high-volume story flow whilst having sub-second latency requirements? Join Katerina to hear about how her team built the Bloomberg real-time alerting platform using open source search technology, explore the challenges that arise at that scale, and learn about the Bloomberg News Search ecosystem.

Katerina Domenikou, Senior Software Engineer @Bloomberg

4:10pm - 5:00pm

Skype's Journey From P2P: It's Not Just About the Services

Skype is known for P2P but today runs its third calling architecture, fourth contacts service, and is about to deploy its fourth chat architecture.

Client changes have been as dramatic as service changes. Bruce addresses why Skype moved away from P2P and the strategies used to make the migrations successful.  Key to safe migrations was robust online experimentation, used to evaluate the impact of both client and service migrations. 

The initial P2P architecture allowed Skype to launch in 2003 and become the de facto standard for Internet voice and video calling.  Over time, however, servers became cheaper, clients began running on phones, and Skype began a gradual evolution from P2P to service-based architecture. But the story isn't as simple as a transition from P2P to services. The P2P architecture always had crucial services, but those services weren't designed for the demands of supporting lightweight clients.   

Experimentation to support the transition from P2P to service-based architecture required addressing the nature of Skype’s client population. With thousands of combinations of client releases and hardware platforms in use, robust scorecards and support for the entire client lifecycle is essential. Building clients based on configuration by focusing on “what” instead of “why” allows clients to continue to work as the overall service evolves to support scenarios unanticipated when the client was designed.

Bruce will discuss the evolution of Skype's architecture and tradeoffs in design made along the way. He will discuss lessons learned and the improvements that are still in process as Skype continues to evolve.

Bruce Lowekamp, Principal Architect for Skype's Cloud Infrastructure Portfolio @Microsoft

5:25pm - 6:15pm

Large Scale Architectures Panel

Join Karen Casella of Netflix as she explores architectural issues with a panel of experts from some of the world's largest architectures.

Karen Casella, Engineering Leader @Netflix, previously leading architecture teams @eBay & @Sun
Matt Klein, Creator of Envoy & Software Engineer @Lyft
Bruce Lowekamp, Principal Architect for Skype's Cloud Infrastructure Portfolio @Microsoft
Haozhe Gao, Software Engineer @Facebook
Joe O’Neill, Software Engineer @Facebook
Susheel Aroskar, Software Engineer @Netflix
Katerina Domenikou, Senior Software Engineer @Bloomberg


This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.