You are viewing content from a past/completed QCon -

Presentation: Scaling Push Messaging for Millions of Devices @Netflix

Track: Architectures You've Always Wondered About

Location: Broadway Ballroom North, 6th fl.

Duration: 1:40pm - 2:30pm

Day of week:

Slides: Download Slides

Level: Intermediate - Advanced

Persona: Architect, Developer, Developer, JVM

This presentation is now available to view on

Watch video

What You’ll Learn

  • Learn how Zuul Push was developed to enable push messaging between a client and server using WebSockets.

  • Hear how Netflix solved issues like thundering herds, stateful connections, and message routing with Zuul Push.

  • Learn about the use cases where Zuul Push was able to improve the user experience for Netflix customers and how engineers are considering using Zuul Push in the future.


How do you efficiently serve the latest personalized movie recommendations to millions of Netflix members, as soon as they are ready?

Netflix recently rolled out Zuul Push - a massively scalable push notification service that handles millions of "always-on" persistent connections from all those Netflix apps running out there. It proactively pushes new data - like personalized movie recommendations - from cloud to devices instead of devices having to poll the server periodically. This has helped reduce data delivery latency and cloud footprint by eliminating wasteful polling requests. It also opens up a whole new set of interesting possibilities like initiating on-demand telemetry of detailed debug data from misbehaving devices in the field.

Zuul push is a high performance async service based on Netty. It supports WebSocket and SSE protocols for push notifications. It handles more than 5.5 million connected clients at peak today and is rapidly growing. We will cover the design of the Zuul Push server and its globally replicated client registry that makes it possible for Netflix to scale to millions of concurrent persistent connections and deliver push notifications globally across the AWS regions.  We will also review the design details of the backend message routing infrastructure that lets any Netflix microservice push notifications to any connected client.

Key takeaways include:

  • How push messaging can be used to add new capabilities to your existing application.
  • How to scale to large number of persistent connections using Netty and async I/O.
  • Differences between operating this type of service versus traditional request/response style stateless REST services.

Zuul is your API gateway right? What is the relationship between Zuul and Zuul Push?


Yes, Zuul is Netflix’ API gateway (all of the Netflix HTTP API traffic passed through Zuu). We took the Zuul code base and grafted Zuul Push on it, so it's a different offering. But they share 90% of the same code base.

With Zull Push, your client would connect directly to the Zull Push Cluster (which is in Amazon’s cloud) and use WebSockets to send things like movie recommendations to the client.


Can you give me an example of what we can expect to see in your talk?


There are many areas where push notifications from Zull Push are handy. I will show an example of List of List of Movies (what we call LoLoMo). LoLoMo is a grid of shows and movies that Netflix shows you up when you log in.

The grids is actually a personalized to customer’s taste. It is not a static grade. For each member, we try to show movies and shows which are most likely to interest you. The typical experience on Netflix (at least for me) is once you log in, you keep on browsing at least for a few minutes (maybe for tens of minutes) to pick the best show to watch. During that time, there is a possibility that while you are browsing, the server would come up with better show recommendations based on personalized session data.

Now, once you know user’s test,  you get the best engagement if there are better recommendations generated in the cloud that you show them to the user. So if there is a better recommendation generated, you would want to let the client know sooner rather than later. That’s what push helps us with.

In our previous system, our application would poll for new recommendations.  Which worked, but it's wasteful and had some limitations such as freshness of the UI being limited to the polling intervals. With freshness, if intervals are too low, you are straining server’s capacity. If intervals are too high, you compromise UI freshness. So push is a really good way to address this. If new recommendations are available and the client is online, it just gets new recommendations via a push message without having to poll the server periodically.


I understand WebSockets, but I am not familiar with SSE (Server-Sent Event) protocols for push notifications. What's that all about?


SSE actually existed before WebSockets were widely supported. WebSockets are duplex. That means both the server and the client can send messages to each other. SSE is a half-duplex connection. Using SSE, only server can send messages to the client.  To initiate SSE, client sends a simple HTTP GET request indicating it can handle SSE event stream (Accept: text/event-stream). The web server then responds in a manner that keeps the connection open even after initial 200 OK response is returned. After this initial exchange, the server can send chunks of  JSON down the stream whenever needed in future.

Every time server sends a chunk down, an event fires on a browser saying you got a new chunk.  In our case, the half-duplex connection is fine because we don’t really have a use case where a client needs to send a message to the server.

SSE is not as widely supported in browsers as WebSockets. It is not supported by IE (Internet Explorer) which is the biggest gap, but Chrome and Safari do support SSE.


So what kind of challenges did you run into using WebSockets in this scale?


The biggest challenge was WebSockets are persistent and long-lived connections. That automatically makes your service more like a stateful service than a stateless REST service. Which means you can’t do quick deploys or a quick rollback. Really that was the biggest challenge to working with WebSockets.

Another issue we had to tackle was how to support large number of open connections efficiently on a single box. This is the classic C10K problem. while handling high volume RPS or requests/responses per second is a well-understood problem there is no continuou RPS with the push cluster. You make a connection and then wait for the server to send you something. Which means CPU and memory utilization is low but you are getting a really high number of mostly idle connections. So being able to support a very high number of open connections per box was a major problem. We’ll talk about this and some of our strategies for dealing with things like thundering herds in the talk.

These were the two major challenges.  

Speaker: Susheel Aroskar

Software Engineer @Netflix

Susheel works as a software engineer on the Cloud Gateway team at Netflix, which develops and operates Zuul - an API gateway that fronts all of the Netflix cloud traffic and handles more than 100 billion requests a day. Prior to Zuul, Susheel worked on Netflix CDN's control plane in the cloud, which is responsible for steering more than a third of all North American peak evening internet traffic. He started at Netflix just as Netflix was taking its first steps towards migrating from datacenter to cloud and he still has scars from those early days to prove it.

Find Susheel Aroskar at

Similar Talks