Track:

Location:

Salon E

Duration

Duration:

5:25pm - 6:15pm

Day of week:

Monday

Level:

Intermediate

Persona:

Architect
Developer
Developer, JVM

Key Takeaways

Hear how the Spotify’s data delivery pipelines delivers events for a 100 million monthly active users to a central location for all the other developers in the company to use.
Hear practical lessons from Spotify around large scale, global deployments of Kafka.
Learn about multi-data center implementation for streaming data, including how they handle failure, replication, and reliability.

Abstract

Spotify’s data is increasing at a rate of 60 billion events per day. The current event delivery system, which is based on Kafka 0.7, is slowly but certainly reaching its limitations. To be able to seamlessly scale the event delivery system with Spotify’s growth, we decided to base the new event delivery system on Google Cloud Pubsub and Google Cloud Dataflow.

Spotify’s event delivery system is one of the foundational pieces of Spotify’s data infrastructure. It has a key requirement to deliver complete data with a predictable latency and make it available to Spotify developers via well-defined interface. Delivered data is than used to produce Discover Weekly, Spotify Party, Year in music and many other Spotify features.

This talk is going to cover the evolution of Spotify’s event delivery system focusing on the lessons learned from operating it and reasons behind the decision of moving it into the cloud.

This talk will also going to cover Scio, which we developed to make the Dataflow SDK easier to use. Scio is a high level Scala API for the Dataflow SDK which brings Dataflow closer to popular data processing frameworks.

Interview

Question:

What is your role today?

Answer:

I am a software engineer working in the Spotify’s data infrastructure. For the past year, we have been working on moving the event delivery from the Legacy to the promised lands.

Question:

What does the promised land's architecture look like?

Answer:

The legacy architecture is built on top of Kafka 0.7 with a lot of additional services added on top of Kafka. Services were added to cope with the limitations of the Kafka that we were using at the time.

Kafka didn’t have the replication, or the reliability. Kafka didn’t have a really good cross datacenter way of working.

When we were adding services on top of Kafka, we didn’t really think about the future At the end it turned out to be a mess, with a lot of operational legacy today.

Today, we are looking at simplifying the architecture., We’re thinking about all the services that we’re building today with the operational mindset, so the operational burden would be as small as possible.

When we deploy the code we want to exactly know what version is running. If there is a problem, we can pinpoint the problem immediately, because we have enough monitoring around it. We want to be the ones noticing the problem before we get paged by our users because we got a large delay in the data delivery.

I guess that is the promised land for us.

Question:

Your focus isn’t on code. It is on how you are building this promise land?

Answer:

We focus on both. The code needs to be well tested before it goes into production. Right now, most of the focus goes in having all the components of the new system in the continuous integration and continuous deployment mode.

Everything should be trackable: we need to know when something was deployed and what version is running, to ensure that everything is correct.

Question:

Are you going to talk about this data delivery pipeline?

Answer:

50% of the talk is going to be dedicated to it - where our event delivery system was, and where we want it to be. I am doing joint talk with Neville. Neville is going to cover what is happening with the data once it’s delivered.,, and what is the tooling they are building to crunch the delivered data. They are basing their tooling on the Google Cloud Dataflow.

Question:

How many events would is going through your event delivery system?

Answer:

It’s 1 million events per second globally in the peak time. Right now we have four datacenters, but if we also include the Google Cloud as a new datacenter, then we can say five.

Question:

Regarding datacenters, can you talk about failure across these data centers and how you deal with replication?

Answer:

In the current system there is no state replication between our data centers. All the events are delivered to our Hadoop cluster, which is located in a single data center. To cope with the network downtime between Hadoop data center and our all other data centers we buffer all events in syslog files on service machines before sending them through our event delivery system. In the new event delivery system we’re introducing replicated state in between all our service machines and Hadoop. For keeping this replicated state we’re going to use Cloud Pub/Sub.

Question:

How would you rate this talk: Beginner, Intermediate, or Advanced

Answer:

It's intermediate. I try to make it accessible, even to the people who don't know all the technologies involved., It helps if you have some experience with the problem domain though.

Question:

What do you feel is the most disruptive tech in IT right now?

Answer:

I like containers, and I think they are a game changer. In Spotify, we have started moving forward with containers in the last two years, and I can see so many benefits in the whole continuous integration and deployment model that we have now. It allowed us to iterate a lot faster.

Speaker: Neville Li

Software Engineer @Spotify

Neville is a software engineer at Spotify who works mainly on data infrastructure for machine learning and advanced analytics. In the past few years he has been driving the adoption of Scala and new data tools for music recommendation, including Scalding, Spark, Storm and Parquet. Before that he worked on search quality at Yahoo! and old school distributed systems like MPI.

Find Neville Li at

Speaker page

@sinisa_lyh

Software engineer at Spotify

Speaker: Igor Maravić

Software Engineer @Spotify

As a part of the band he worked on developing and maintaining Spotify's gateways, migrating mobile clients from using custom TLV protocol to HTTP, designing and developing continuous delivery infrastructure, stress testing services... Currently he's living and breathing event delivery.

Find Igor Maravić at

Speaker page

Software Engineer at Spotify

Similar Talks

Learnings from a Culture First Startup

CTO @Buffer

Sunil Sadasivan

Day in the Life with Speech Recognition, Machine Learning, and IOT

Distinguished Engineer, Emerging Technology @IBM

Mark Vanderwiele

Architecting Your Application for Federation

Senior Development Evangelist @Okta

Joël Franusic

A Microservice Based Architecture for Stream Processing using Spring Cloud Data Flow

Senior Staff Engineer @Pivotal

Mark Pollack

Becoming an Outlier

Software Architect @VinSolutions, Author @pluralsight

Cory House

ESPN Next Generation APIs Powering Web, Mobile, TV

Senior Director of Distribution Platforms @ESPN

Manny Pelarinos

The Human Side of Microservices

Tech Lead @Yelp

John Billings

Lessons Learned on Uber's Journey into Microservices

Software Engineer @Uber

Emily Reinhold

What They Don’t Tell You About Microservices…

CTO @Yodle

Daniel Rolnick

Tracks

Monday, 13 June

Architectures You've Always Wondered About

Case studies from: Google, Linkedin, Alibaba, Twitter, and more...
Stream Processing @ Scale

Technologies and techniques to handle ever increasing data streams
Culture As Differentiator

Stories of companies and team for whom engineering culture is a differentiator - in delivering faster, in attracting better talent, and in making their businesses more successful.
Practical DevOps for Cloud Architectures

Real-world lessons and practices that enable the devops nirvana of operating what you build
Incredible Power of an Open-Sourced .NET

.NET is more than you may think. From Rx to C# 7 designed in the open, learn more about the power of open source .NET
Sponsored Solutions Track 1

Tuesday, 14 June

Better than Resilient: Antifragile

Failure is a constant in production systems, learn how to wield it to your advantage to build more robust systems.
Innovations in Java and the Java Ecosystem

Cutting Edge Java Innovations for the Real World
Modern CS in the Real World

Real-world Industry adoption of modern CS ideas
Containers: From Dev to Prod

Beyond the buzz and into the how and why of running containers in production
Security War Stories

Expert-level security track led by well known and respected leaders in the field
Sponsored Solutions Track 2

Wednesday, 15 June

Microservices and Monoliths

Practical lessons on services. Asks the question when and when to NOT go with Microservices?
Modern API Architecture - Tools, Methods, Tactics

API-based application development, and the tooling and techniques to support effectively working with APIs in the small or at scale. Using internal and external APIs
Commoditized Machine Learning

Barriers to entry for applied ML are lower than ever before, jumpstart your journey
Full Stack JavaScript

Browser, server, devices - JavaScript is everywhere
Optimizing Yourself

Keeping life in balance is always a challenge. Learning lifehacks
Sponsored Solutions Track 3

See the Full Schedule

Location:

Duration

Day of week:

Level:

Persona:

Key Takeaways

Abstract

Interview

Find Neville Li at

Find Igor Maravić at

Similar Talks

Tracks

Monday, 13 June

Tuesday, 14 June

Wednesday, 15 June

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Presentation: Handling Streaming Data in Spotify Using the Cloud

Location:

Duration

Day of week:

Level:

Persona:

More talks on:

Key Takeaways

Abstract

Interview

Find Neville Li at

Find Igor Maravić at

Similar Talks

Tracks

Monday, 13 June

Tuesday, 14 June

Wednesday, 15 June

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World