This talk presents data from the Verica Open Incident Database (VOID) to conclusively demonstrate how aggregate incident metrics (MTTR, severity, # of incidents/time) aren't representative of your systems' resilience. I then pair those data with observations from actual incident reports of what kind of useful information can be gleaned from incident analysis, and suggest alternate things you can measure instead in order to demonstrate learning from incidents in your organization.
Internet Incident Librarian & Senior Research Analyst at Verica, previously @Holloway @Fastly @O’Reilly Media @Microsoft & @Amazon
Courtney Nash is a researcher focused on system safety and failures in complex sociotechnical systems. An erstwhile cognitive neuroscientist, she has always been fascinated by how people learn, and the ways memory influences how they solve problems. Over the past two decades, she’s held a variety of editorial, program management, research, and management roles at Holloway, Fastly, O’Reilly Media, Microsoft, and Amazon. She lives in the mountains where she skis, rides bikes, and herds dogs and kids.