Using Traffic Modeling to Load-Balance Netflix Traffic at Global Scale

Netflix Infrastructure supports personalized UI and Streaming experience across 230M+ members around the world. Spread across multiple locations, it’s important to have predictability and control over how user traffic is balanced across them, to ensure balance between latency, infrastructure costs, and availability risk. 

This talk will tell a story of how Netflix has shifted from geo-based DNS load-balancing to latency-based approach, relying on real-user measurements and building a global data model of Netflix traffic to reduce costs while reducing latency and outage risks. We will cover challenges of integrating the solution into Cloud and CDN components of Netflix infrastructure, and trade-offs between model accuracy and traffic model complexity. The talk also demonstrates how the data-driven approach was applied to influence future infrastructure decisions, by simulating impact of potential infrastructure changes with precision and minimal engineering effort.


Speaker

Niosha Behnam

Staff Software Engineer @Netflix

Niosha is a Staff Software Engineer on the Compute Abstractions Team at Netflix.  Over his tenure he was a founding member of the Traffic & Chaos Team where he helped build the software that powers cloud traffic management, regional failover, and resilience.  Most recently, in addition to exploring opportunities for expanding Netflix’s global cloud footprint, Niosha has been tackling improved traffic steering visibility to minimize cloud cost while optimizing user experience.

Prior to Netflix, Niosha built custom IaaS offerings for specialized private clouds and contributed to R&D leveraging big data approaches to ingest, analyze, and visualize large volumes of relational data.

Read more
Find Niosha Behnam at:

Speaker

Sergey Fedorov

Director of Engineering @Netflix

Sergey is a hands-on engineering leader with over 15 years of experience in global infrastructure and distributed systems. At Netflix he worked on a range of components in the Content Delivery domain, like building a monitoring system for the Open Connect CDN infrastructure, launching  FAST.com speed test, accelerating device-AWS requests and improving Netflix traffic management. Today Sergey is leading the engineering effort to enable Live Streaming functionality for Netflix users. 

Sergey is a vocal advocate of an observability approach to engineering and making data-driven decisions. Finding actionable signals in loosely controlled environments is what keeps him awake, much better than caffeine. This might also explain why outside of work Sergey can be seen playing ice hockey, brewing beer, or exploring exotic travel destinations.

Read more
Find Sergey Fedorov at:

Date

Tuesday Jun 13 / 10:35AM EDT ( 50 minutes )

Location

Salon A-C

Topics

Architecture Platform Engineering Data Analytics Traffic Management Capacity Management

Share

From the same track

Session Architecture

Global Capacity Management through Strategic Demand Allocation

Tuesday Jun 13 / 01:40PM EDT

Meta currently operates in more than 15 data center regions around the world. This rapidly expanding global datacenter footprint poses new challenges for service owners as well as our infrastructure management systems.

Speaker image - Ranjith Kumar S
Ranjith Kumar S

Software Engineer @Meta

Session Architecture

From Open Source to SaaS: The Journey of ClickHouse

Tuesday Jun 13 / 05:25PM EDT

Have you ever wondered what it takes to go from an open-source project to a fully-fledged saas product? How about doing that in only 1 year’s time? If the answer is yes, then this talk is for you. You’ll hear straight from the experts who worked on the design, and execution of this huge project.

Speaker image - Sichen Zhao
Sichen Zhao

Senior Software Engineer @Clickhouse

Speaker image - Shane Andrade
Shane Andrade

Principal Software Engineer @ClickHouse

Session

Several Components are Rendering: Client Performance at Slack-Scale

Tuesday Jun 13 / 02:55PM EDT

Our users expect the interactions in our applications and websites to be fast, no matter how complicated they are under the hood. In this talk, we’ll explore some frontend performance issues encountered by Slack as they continue to grow and evolve the desktop app.

Speaker image - Jenna Zeigen
Jenna Zeigen

Staff Engineer @Slack

Session Platform

Building Sub-Second Latency Video Infrastructure at Cloudflare

Tuesday Jun 13 / 04:10PM EDT

Cloudflare has deployed a sub-second latency live streaming system at scale over the last few years. In this talk, we’ll provide insight on how this works under the cover, specifically focusing on protocols that Cloudflare Stream uses: HLS, DASH, RTMPS, SRT and WebRTC.

Speaker image - Renan Dincer
Renan Dincer

Systems Engineer @Cloudflare

Session Architecture

Unconference: Architectures You've Always Wondered About

Tuesday Jun 13 / 11:50AM EDT

What is an unconference? An unconference is a participant-driven meeting. Attendees come together, bringing their challenges and relying on the experience and know-how of their peers for solutions.

Speaker image - Ben Linders
Ben Linders

Independent Consultant in Agile, Lean, Quality and Continuous Improvement