Building a Large Scale Real-Time Ad Events Processing System

Two years ago, we embarked on building DoorDash's ad platform from the ground up. Today, our platform handles over 2 trillion events every day and our advertising business has experienced significant growth in recent years, becoming a key area of focus for the company. To generate ad metrics and analytics in real-time, we built an ad event tracking, attribution, and analysis pipeline on top of Apache Flink, Apache Kafka, Apache Pinot and our in-house real-time event processing system. This powerful combination enables us to manage a large number of active ad campaigns with reliable ad delivery and timely attribution. It also allows us to share ad metrics with advertisers in real-time.

During this session, we will start by introducing the core concepts of an online advertising system to provide a better understanding of the crucial role played by the ad event processing pipeline. We will then present an overview of our end-to-end pipeline and delve into specific challenges we encountered. These include:

  • The evolution of our core attribution job to address data races and the lessons we learned from it

  • Our approach to ensuring fault tolerance across the entire pipeline

  • We will also share best practices for designing and developing large-scale Flink streaming pipelines for production environments.

This session will provide you with insights and practical knowledge to help you build robust and efficient streaming pipelines for your ad platforms. By attending, you will gain a deeper understanding of the key challenges involved in building a scalable and fault-tolerant ad event processing pipeline, including data ingestion, real-time processing, attribution, and reporting.


Speaker

Chao Chu

Software Engineer @DoorDash

Chao Chu is a backend engineer at DoorDash. He is working at the ads foundation team focusing on ad event pipeline and ad exchange service. Previously, he worked at Morgan Stanley where he helped build the Fixed Income Risk Infrastructure platform using Scala. He is passionate about building large scale distributed systems.

Read more

Date

Tuesday Jun 13 / 02:55PM EDT ( 50 minutes )

Location

Salon E

Topics

Data Architecture Streaming Real-Time

Share

From the same track

Session Streaming

Laying the Foundations for a Kappa Architecture - The Yellow Brick Road

Tuesday Jun 13 / 10:35AM EDT

In the ever changing landscape of big data, focus is slowly moving away from batch and towards realtime analytics. Data Science workflows are evolving to adapt to this changing landscape.

Speaker image - Sherin Thomas
Sherin Thomas

Staff Software Engineer @Chime

Session Serverless

The Rise of the Serverless Data Architectures

Tuesday Jun 13 / 01:40PM EDT

For a while, it looked like Serverless was just a convenient way to run stateless functions in the cloud. But in the last year we’ve seen the rapid rise in serverless data stores.

Speaker image - Gwen Shapira
Gwen Shapira

Founder @Nile, PMC Member @Kafka

Session Stream Processing

Streaming from Apache Iceberg - Building Low-Latency and Cost-Effective Data Pipelines

Tuesday Jun 13 / 11:50AM EDT

Apache Flink is a very popular stream processing engine featuring sophisticated state management, even-time semantics, exactly-once state consistency. For low latency processing, Flink jobs typically consume data from streaming sources like Apache Kafka.

Speaker image - Steven Wu
Steven Wu

Software Engineer @Apple and Apache Iceberg PMC

Session Architecture

Enabling Remote Query Execution Through DuckDB Extensions

Tuesday Jun 13 / 04:10PM EDT

DuckDB is a high-performance, embeddable analytical database system that has gained massive popularity in the last few years.

Speaker image - Stephanie Wang
Stephanie Wang

Founding Engineer @MotherDuck

Session

Unconference: Modern Data Architecture & Engineering

Tuesday Jun 13 / 05:25PM EDT

What is an unconference? An unconference is a participant-driven meeting. Attendees come together, bringing their challenges and relying on the experience and know-how of their peers for solutions.

Speaker image - Ben Linders
Ben Linders

Independent Consultant in Agile, Lean, Quality and Continuous Improvement