Presentation: Using Luigi to build data pipelines that won’t wake you at 3am

Location:

Day of week:

1:30pm - 2:20pm

Datadog collects hundreds of billions of data points from our customers’ infrastructure every single day. In addition to our realtime systems, we run a significant number of offline batch jobs to crunch this data. These algorithms yield complicated graphs of jobs and dependencies running across multiple distributed systems.

In this environment, failures can and do happen, often in the middle of the night. To prevent (most) failures from waking up humans, Datadog uses Luigi, a framework for crafting complex batch data pipelines.

In this talk, we’ll discuss:

  • How to craft data pipelines with Luigi 
  • How to make pipelines idempotent for easy restart and failure recovery 
  • And plenty of examples of how this works for us in practice

Matthew Williams Elsewhere

Similar Talks

Senior Product Manager, IBM Watson Group
Yelp Docker Mesos SmartStack PaaS
HashiCorp Founder, DevOps, Vagrant, Terraform, Consul, Packer
‎Founder, Principal Consultant at Big Data Open Source Security LLC
Director of Engineering, Ebay
Distinguished Member of Technical Staff, Architect at eBay, Inc

Tracks

Wednesday Jun 10

Thursday Jun 11

Friday Jun 12

Conference for Professional Software Developers