Resilience Engineering - Culture as a System Requirement

Learn how organizations remain resilient across changing socio-technical systems. Come hear about how SREs and Ops engineers make change happen and how they respond to outages and learn from incidents.


From this track

Session

Two Years of Incidents at 6 Different Companies: How a Culture of Resilience Can Help You Accomplish Your Goals

Thursday Jun 15 / 10:35AM EDT

Incidents and outages are expensive, they impact engineering productivity, business goals, and your company’s reputation. In this talk I will describe how we can apply resilience throughout the incident lifecycle in order to turn incidents into opportunities.

Speaker image - Vanessa Huerta Granda
Vanessa Huerta Granda

Solutions Engineer @Jeli.io

Session Resilience

Comparing Apples and Volkswagens: The Problem With Aggregate Incident Metrics

Thursday Jun 15 / 11:50AM EDT

This talk presents data from the Verica Open Incident Database (VOID) to conclusively demonstrate how aggregate incident metrics (MTTR, severity, # of incidents/time) aren't representative of your systems' resilience.

Speaker image - Courtney Nash
Courtney Nash

Internet Incident Librarian & Senior Research Analyst at Verica, previously @Holloway @Fastly @O’Reilly Media @Microsoft & @Amazon

Session Resilience Engineering

Resilience Hides in Plain Sight

Thursday Jun 15 / 01:40PM EDT

Think of the most out-of-nowhere and surprising incident you've experienced.

Speaker image - John Allspaw
John Allspaw

Founder and Principal @Adaptive Capacity Labs

Session Resilience Engineering

Embrace Complexity; Tighten Your Feedback Loops

Thursday Jun 15 / 02:55PM EDT

When dealing with an environment that feels chaotic and unreliable, a common tendency is to look for ways to reduce variability and bring things back under control through procedures, hierarchy, metrics, and standardization.

Speaker image - Fred  Hebert
Fred Hebert

Staff SRE @Honeycombio

Session Resilience Engineering

5 Strategies to Resiliently Handle Uncertainty, Time Pressure & Change

Thursday Jun 15 / 04:10PM EDT

As an engineer tasked with keeping large-scale software systems running under changing priorities and time pressure, you need REsilience capabilities that are both technical and organizational to successfully navigate modern software engineering work.

Speaker image - Dr. Laura Maguire
Dr. Laura Maguire

Cognitive Systems Engineer & Researcher

Date

Thursday Jun 15 / 10:30AM EDT

Share

UNABLE TO MAKE QCON NEW YORK?

Join us at QCon London on April 8-10, 2024.

Registration is open!

Track Host

Vanessa Huerta Granda

Solutions Engineer @Jeli.io

Vanessa is a Solutions Engineer at Jeli.io helping companies make the most of their incidents. Previously, she led Resilience Engineering at Enova and has spent the last decade focusing on Production Incident processes, learning from incidents, and handling Major Incidents as Incident Commander. She has spoken and written on incident metrics, sharing learnings, and in 2021 co-authored Jeli’s Howie: The Post-Incident Guide

She is passionate about continuous improvement, getting teams to talk to each other, and sharing incident findings.

Read more