You are viewing content from a past/completed QCon

Presentation: PID Loops and the Art of Keeping Systems Stable

Track: Modern CS in the Real World

Location: Broadway Ballroom South, 6th fl.

Duration: 1:40pm - 2:30pm

Day of week: Monday

Share this on:

This presentation is now available to view on

Watch video with transcript

What You’ll Learn

  1. Find out why and how AWS is using PID Loops.
  2. Learn how to verify and enforce system stability with PID Loops.


Building ultra-reliable large-scale services is an incredible challenge. Systems often exhibit emergent properties and network effects that can be beyond the practical limits of testing, how do we keep things stable even when the unpredictable happens? Control theory, a branch of engineering that has existed for over a hundred years has a lot to offer us. Systems of all sizes can be analyzed and stabilized with PID control loops - often simple algorithms that contain Propotional, Integral, and Derivative components. But how? This session will show what PID loops look like in the context of modern systems, and to see how expoential backoff, flow-control, and other techniques can be wielded to build self-healing systems.


Tell us a bit about some of the stuff that you've worked on.


I've been working at Amazon Web Services for eleven years and I've got to work on a lot of activities. Right now I work on EC2, but also I've got to work on platforms and Route 53, S3, ELB, and a few more in between.


What can a software engineer learn about PID Loops?


I think probably the biggest thing to learn about PID Loops is the loop part, that we can build stable systems by measuring those systems, seeing what state they're in, and then driving them to the state we want them to be. Taking that approach and measuring things first and then applying any corrections we need turns out to be incisive deep powerful way to build systems that's not intuitive.


Do you have to be massive scale to be able to use a PID loop effectively?


It works even for very simple systems, a system of one or two boxes, and you're just trying to get some very simple configuration data that box, user settings or something like that. Nine times out of ten most people have solved that problem by just sending the settings to that box and it'll work most of the time, but occasionally they won't get there because maybe there's a network problem or a system crash or something. And even in a very simple case like that a controller with a loop will fix it. It will detect that it's not the way it should be to repair it.


What do you want an individual contributor architect to leave your talk with?


To be able to walk away and look at control systems that distribute settings or configuration, and just be able to tell whether they're stable or likely to be stable.  

Speaker: Colm MacCárthaigh

Senior Principal Engineer @awscloud

Colm is an engineer at Amazon Web Services. For just over ten years Colm has been building some of the largest services at AWS, including Amazon EC2, S3, ELB, CloudFront, and Route53.  Colm is also an active Open Source contributor and is the main author of Amazon s2n, AWS's Open Source implementation of TLS/SSL, as well as a member of the Apache Software Foundation and a core contributor to Apache httpd and apr. In evenings and weekends, Colm is an Irish folk musician and singer and regular tours, produces and records albums, and enjoys teaching workshops. 

Find Colm MacCárthaigh at


  • Devex & Teams

    Explore how to reduce developer friction between teams and stakeholders.

  • Blameless Culture

    Absorb the lessons learned from failures and outages in a human-centric process.

  • Modern CS in the Real World

    Learn how companies are applying recent CS research to tackle concurrency, distributed data, and coordination.

  • Architectures You’ve Always Wondered About

    Next-gen architectures from the most admired companies in software.

  • Bare Knuckle Performance

    Learn from practitioners on the challenges and benefits of architecting for performance and much more.

  • Java - The Interesting Bits

    Learn the new features in the recent and near-future releases of Java and the JVM and what they offer.

  • Ethical Considerations in Consciously Designed Software

    Design considerations for various contexts, locations, security and privacy requirements.

  • Operating Microservices

    Learn from practitioners operating and evolving systems in performance demanding environments.

  • Security

    Learn how to make security an inherent part of the software development process.

  • Native Compilation Is Back (A Look at Non-Vm Compilation Targets)

    Issues with native compilation for in browser-based and server-side environments

  • Trouble-Shooting in Production

    Learn debugging strategies for complex and high stakes environments where standard debuggers and profilers fail.

  • Predictive Architectures and ML

    Explore the systems and designs covering the full loop from machine learning to inferencing.

  • Data Engineering on the Bleeding Edge

    Explore the latest trends in data engineering that help improve the life of the developer

  • Production Readiness

    Observability, emergency response, capacity planning, release processes, and SLOs for availability and latency.

  • Humane Leadership

    A look at leadership with an emphasis on empathy, taking chances and building other leaders within organizations and teams