Conference: Jun 13-15, 2016
Tutorials: Jun 16-17, 2016
Presentation: Too Big to Fail: Lessons from Google and healthcare.gov
Failure is a fact of life, so we design our system to be fault-tolerant at all levels. In practice, however, some components almost never fail. As the product grows, these components are increasingly stressed in new and different ways; when they ultimately do fail they create outages for which we are unprepared. We thought we were designing for failure, but the design didn't include failures at this level. At Google, some of our most exciting production snafus involve large and unpredictable network-level failures; at healthcare.gov in late 2013, just about every component fell into this category on a daily level.
Through stories of large-scale Google outages and smaller-scale healthcare.gov outages, we’ll illustrate situations we’re often flying blind to and draw lessons from them about how to expose unknown weak points in our systems. We’ll discuss the importance of being able to model systems ahead of time and visualize solutions in real time (including during an outage). Attendees will learn a practical framework for anticipating potential large-scale outages and specific ways to increase systemic robustness, for example “practicing disaster”. Failure -- even large failure -- is a fact of life; outages don’t have to be.
Nori Heikkinen Elsewhere
Similar Talks
Tracks
Wednesday Jun 10
-
Applied Data Science and Machine Learning
Putting your data to use. The latest production methods for deriving novel insights
-
Engineer Your Culture
Building and scaling a compelling engineering culture
-
Modern Advances in Java Technology
Tips, techniques and technologies at the cutting edge of modern Java
-
Monoliths to Microservices
How to evolve beyond a monolithic system -- successful migration and implementation stories
-
The Art of Software Design
Software Arch as a craft, scenario based examples and general guidance
-
Sponsored Solutions Track I
Thursday Jun 11
-
Emerging Technologies in Front-end Development
The state of the art in client-side web development
-
Fraud Detection and Hack Prevention
Businesses are built around trust in systems and data. Securing systems and fighting fraud throughout the data in them.
-
Reactive Architecture Tactics
The how of the Reactive movement: Release It! techniques, Rx, Failure Concepts, Throughput, Availability
-
Architecting for Failure
War stories and lessons learned from building highly robust and resilient systems
-
High Performance Streaming Data
Scalable architectures and high-performance frameworks for immediate data over persistent connections
-
Sponsored Solutions Track II
Friday Jun 12
-
Architectures You've Always Wondered about
Learn from the architectures powering some of the most popular applications and sites
-
Continuously Deploying Containers in Production
Production ready patterns for growing containerization in your environment
-
Mobile and IoT at Scale
Users, Usage and Microservices
-
Modern Computer Science in the Real World
How modern CS tackles problems in the real world
-
Optimizing Yourself
Maximizing your impact as an engineer, as a leader, and as a person
-
Sponsored Solutions Track III