Engineering Systems for Real-Time Predictions @DoorDash | Software Development Conference QCon New York

What You’ll Learn

Understand the most common problems that come with using machine learning in practice.
Gain a better understanding of moving from algorithms to real-world products.
Learn some of the tools and techniques DoorDash uses to overcome some of these problems and ship their prediction service.

Abstract

Today, applying machine learning to drive business value in a company requires a lot more than figuring out the right algorithm to use; it requires tools and systems to manage the entire machine learning product lifecycle. For instance, we need systems to manage data pipelines, to monitor model performance and detect degradations, to analyze data quality and ensure consistency between training and prediction environments, to experiment with different versions of models, and to periodically retrain models and automatically deploy them.

At DoorDash, an on-demand logistics company, we fulfill deliveries on a dynamic marketplace, which requires extensive use of real-time predictions. Through many iterations of applying machine learning in our products, we identified solutions to address the above problems and built these into our machine learning platform. This has dramatically reduced the cost of integrating machine learning into our products, saved us weeks of development time, and allowed us to use ML in new product areas.

In this talk, we will present our thoughts on how to structure machine learning systems in production to enable robust and wide-scale deployment of machine learning and share best practices in designing engineering tooling around machine learning.

Question:

QCon: Can you describe the machine learning platform you have leverage at DoorDash?

Answer:

Raghav: We built our system around common machine learning open source libraries in Python like SciKit-Learn, LightGBM, and Keras. We have a microservices architecture also built in Python which includes a prediction service that handles all the predictions and a features service. All the services are hosted on AWS.

Question:

QCon: Can you briefly describe your real-time prediction system?

Answer:

Raghav: Our Prediction system responds to HTTP/RPC requests, it accesses a model store to fetch the right model to use and obtains features from a features service.

There are two types of features are used for predictions

Real Time features about a delivery. These are things such as how many items does this delivery contain or what time/day of the week is it right now. These features are calculated about the delivery and passed into the system.
Batch Aggregate features which are pre-calculated and exposed through the features-service

So, for example, to predict ETAs, for every delivery, we make an HTTP/RPC request to the prediction service which knows how to fetch the model, use these features, and makes the prediction.

Question:

QCon: In your abstract, you talk about going through iterations of models. How do you go about testing and comparing your models at DoorDash?

Answer:

Raghav: We use two layers of testing.

Before launching a model, we use a shadow set up, where we don’t use the model to change the product. Instead, we measure the predictions against a current model which is running. This helps us to determine the accuracy of the model being tested in production. This is the first layer of testing.

The second layer of testing is an a/b test choosing amongst the multiple models available. We start using the model in the actual product. We measure the performance and also look at the overall product metrics, for example, engagement metric (or other user metrics).

Question:

QCon: What do you want the audience to take-away from your talk?

Answer:

Raghav: The biggest take away would be to understand the common problems encountered when implementing machine learning in real-world products. I plan to also discuss a few ideas on designing systems to overcome these problems and thereby ship more machine learning models in practice.

An example of a common problem is discrepancy between training and production environments. Models are often trained offline and when you use it in production, the feature distributions between the two environments could be different and that would affect the accuracy of the predictions. I will go through how the systems we built help us solve these issues

Speaker: Raghav Ramesh

Engineering systems for real-time predictions @DoorDash

Raghav Ramesh is a machine learning engineer at DoorDash working on its core logistics engine, where he focuses on AI problems: vehicle routing, Dasher assignments, delivery time predictions, demand forecasting, and pricing. Previously, Raghav worked on various data products at Twitter, including recommendation systems, trends ranking, and growth analytics. He holds an MS from Stanford University, where he focused on artificial intelligence and operations research.

Find Raghav Ramesh at

Speaker page

Similar Talks

Programming for Hostile Environments

SVP, Engineering @packethost

Nathan Goulding

Platforms at Twilio: Unlocking Developer Effectiveness

Senior Director Platform Engineering @twilio

Justin Kitagawa

Help! I Accidentally Distributed My System!

Software Engineer & Engineering Manager @Honeycombio

Emily Nakashima

Help! I Accidentally Distributed My System!

Developer Programs Engineer @Google

Rachel Myers

Heretical Resilience: To Repair is Human

Staff Infrastructure Engineer @travisci

Ryn Daniels

Effective Java, Third Edition - Keepin' it Effective

Author of Effective Java, Lead Design of Java Collection API & Carnegie Mellon Professor

Joshua Bloch

AutoCAD & WebAssembly: Moving a 30 Year Code Base to the Web

Software Architect @autodesk

Kevin Cheung

Software Is Eating the World, ML Is Going to Eat Software

Language Designer Working on Tooling @Facebook, worked on TypeScript, F#, & Swift

Joe Pamer

Smart Speakers: Designing for the Human

UX Lead @Google

Charles Berg

Tracks

Microservices: Patterns & Practices

Evolving, observing, persisting, and building modern microservices
Developer Experience: Level up Your Engineering Effectiveness

Improving the end to end developer experience - design, dev, test, deploy, operate/understand. Tools, techniques, and trends.
Modern Java Reloaded

Modern, Modular, fast, and effective Java. Pushing the boundaries of JDK 9 and beyond.
Modern User Interfaces: Screens and Beyond

Zero UI, voice, mobile: Interfaces pushing the boundary of what we consider to be the interface
Practical Machine Learning

Applied machine learning lessons for SWEs, including tech around TensorFlow, TPUs, Keras, Caffe, & more

Ethics in Computing

Inclusive technology, Ethics and politics of technology. Considering bias. Societal relationship with tech. Also the privacy problems we have today (e.g., GDPR, right to be forgotten)
Architectures You've Always Wondered About

Next-gen architectures from the most admired companies in software, such as Netflix, Google, Facebook, Twitter, Goldman Sachs
Modern CS in the Real World

Thoughts pushing software forward, including consensus, CRDT's, formal methods, & probalistic programming
Container and Orchestration Platforms in Action

Runtime containers, libraries, and services that power microservices
Finding the Serverless Sweetspot

Stories about the pains and gains from migrating to Serverless.

Chaos, Complexity, and Resilience

Lessons building resilient systems and the war stories that drove their adoption
Real World Security

Practical lessons building, maintaining, and deploying secure systems
Blockchain Enabled

Exploring Smart contracts, oracles, sidechains, and what can/cannot be done with blockchain today.
21st Century Languages

Lessons learned from languages like Rust, Go-lang, Swift, Kotlin, and more.
Empowered Teams

Safely running inclusive teams that are autonomous and self-correcting

Schedule

Track: Practical Machine Learning

Location: Empire Complex, 7th fl.

Duration: 1:40pm - 2:30pm

Day of week: Wednesday

Level: Intermediate - Advanced

Persona: CTO/CIO/Leadership, Data Scientist, Developer

What You’ll Learn

Abstract

Find Raghav Ramesh at

Similar Talks

Tracks

Microservices: Patterns & Practices

Developer Experience: Level up Your Engineering Effectiveness

Modern Java Reloaded

Modern User Interfaces: Screens and Beyond

Practical Machine Learning

Ethics in Computing

Architectures You've Always Wondered About

Modern CS in the Real World

Container and Orchestration Platforms in Action

Finding the Serverless Sweetspot

Chaos, Complexity, and Resilience

Real World Security

Blockchain Enabled

21st Century Languages

Empowered Teams

Presentation: Engineering Systems for Real-Time Predictions @DoorDash

Track: Practical Machine Learning

Location: Empire Complex, 7th fl.

Duration: 1:40pm - 2:30pm

Day of week: Wednesday

Level: Intermediate - Advanced

Persona: CTO/CIO/Leadership, Data Scientist, Developer

More talks on:

Share this on:

What You’ll Learn

Abstract

Find Raghav Ramesh at

Similar Talks

Tracks