warning icon QCon New York 2020 has been canceled. See our current virtual and in-person events.

Resilience Engineering

Past Presentations

Using Chaos To Build Resilient Systems

There are those of us that are motivated to build resilient systems, improve uptime, move fast and keep systems reliable. Then there are those of us who feel overwhelmed by our to-do lists and the features or projects we feel we need to get out the door.  The world needs more resilient...

Tammy Butow Principal Site Reliability Engineer @Gremlin
UNBREAKABLE: Learning to Bend but Not Break at Netflix

How do you gain confidence that a system is behaving as designed and identify vulnerabilities before they become outages? You may have thought about using chaos engineering for this purpose, but it’s not always clear what that means or if it’s a good fit for your system and team. My...

Haley Tucker Senior Software Engineer, Chaos Engineering @Netflix
Have You Tried Turning It Off and On Again?

Would you jump on this train of thought for a moment and see if you agree? Let’s say you have some number of computers. It could be three, it could be kerjillions, the number probably doesn’t matter too much for this thought experiment. Now lets say you have a number of people, probably...

David Blank-Edelman Senior Cloud Ops Advocate @Microsoft
The History of Fire Escapes

When a datacenter goes offline, a server gets overloaded, or a binary hits a crashing bug, we usually have a contingency plan. We reduce damage, redirect traffic, page someone, drop low-priority requests, follow documented procedures. But why do many failures still come as a surprise? In this...

Tanya Reilly Principal Engineer @squarespace
Heretical Resilience: To Repair is Human

Resilient architecture is often thought of solely in terms of its technical aspects - with the right distributed system or automated failover or fancy new orchestration software, we want to believe we can avoid the inevitability of failure. While it is certainly true that we can design our...

Ryn Daniels Staff Infrastructure Engineer @travisci

Interviews

Tammy Butow Principal Site Reliability Engineer @Gremlin

Using Chaos To Build Resilient Systems

What do you want someone to leave your talk with?

Everyone who comes along to this talk will leave with an understanding of how they can start seeing massive benefits from practicing Chaos Engineering within 3 months. Chaos Engineering to me is the fastest, most efficient way to take a giant leap forward for the resilience of your systems and team. 

Read Full Interview
Haley Tucker Senior Software Engineer, Chaos Engineering @Netflix

UNBREAKABLE: Learning to Bend but Not Break at Netflix

Tell me about your talk.

I’m going to share my personal journey at Netflix learning to build and operate distributed systems -- both as a service owner and as a Chaos engineer.  As service owner, I’ll provide examples of how I used Chaos engineering to build better systems, even for non-critical services. As a chaos engineer, I’ll cover some of the...

Read Full Interview
Ryn Daniels Staff Infrastructure Engineer @travisci

Heretical Resilience: To Repair is Human

Tell me a bit about the work that you do today.

I'm currently working at Travis CI where I'm the lead of the build environment team. This team is working on the environment that allows our customers to run their builds - making sure that we can create, test, and update the environments where customer builds get run in a reliable manner, so that customers can continue to test as new...

Read Full Interview

Less than

0

weeks until QCon New York 2020

Registration is $3055.00 ($0 off) for the 3-day conference if you register before Dec 31st
SAVE YOUR SEAT