Location:

Salon D

Day of week:

Thursday

Complex systems fail in spectacular ways. Failure isn’t a question of if, but when. Resilient systems recover from failure; robust systems resist failure. In this track we’ll hear from experts who have designed systems that shifted from fragility to resilience and robustness in the face of failure. Attendees will learn architectural patterns and approaches that didn’t and did work, with take-aways that can be applied to their own systems.

10:35am - 11:25am

by Jon Moore
Senior Fellow, Comcast Cable

Partial Failures in a Microservices Jungle: Survival Tips from Comcast

Comcast’s TV products serve tens of millions of customers and are powered by a suite of dozens of services that are continuously developed and operated by hundreds of technical staff. While we have enjoyed many of the touted benefits of a microservice architecture--looser coupling between teams, independent deployments--we have also encountered the corresponding reliability challenges. Delivering business value in this environment can seem like hacking your way through the wilderness at...

11:50am - 12:40pm

by Nori Heikkinen
Google Site Reliability Engineering Expert

Too Big to Fail: Lessons from Google and healthcare.gov

Failure is a fact of life, so we design our system to be fault-tolerant at all levels. In practice, however, some components almost never fail. As the product grows, these components are increasingly stressed in new and different ways; when they ultimately do fail they create outages for which we are unprepared. We thought we were designing for failure, but the design didn't include failures at this level. At Google, some of our most exciting production snafus involve large and unpredictable...

1:40pm - 2:30pm

by Kolton Andrus
Chaos Engineer at Netflix

Breaking Bad at Netflix: Building Failure as a Service

Netflix’s 57M members watch over 2 billion hours of content per month and their streaming accounts for 1/3rd of Internet traffic in some parts of the world. The Edge platform, which 1000’s of devices rely on to access the streaming experience, guards the front door to Netflix where any major issue results in a twitter storm.

In order to harden our systems, we designed “Failure as a Service” to allow anyone to test and validate how our systems handle failure. Purposefully injecting...

2:55pm - 3:45pm

by Tom Limoncelli
Author, SRE @ Stack Exchange

Fail Better: Radical Ideas from the Practice of Cloud Computing

Distributed or "cloud" computing involves many moving parts, any of which can break or fail. Succeeding in this environment requires embracing failure, not running or hiding from it. To do this requires challenging our instincts with radical ideas. Tom will highlight some of the most radical advice from the new book “The Practice of Cloud System Administration”.

Topics will include: create resiliency at the most economic level, do risky procedures often, and create a blameless culture...

4:10pm - 5:00pm

Open Space

Architecting for Failure Open Space

5:25pm - 6:15pm

by Joe Stein
‎Founder, Principal Consultant at Big Data Open Source Security LLC

Making Distributed Data Persistent Services Elastic (Without Losing All Your Data)

Building and deploying elastic distributed data centric systems that can fail, without losing data and without sacrificing elasticity, has been traditionally challenging. With Apache Mesos, an open source project that is the kernel for your data center, we can now create fully elastic end to end compute environments. With Mesos, distributed data persistent services can run durably and elastically. Kafka, HDFS, Cassandra, MySQL and more data centric systems run on Mesos.

We will talk...

Host: Philip Fisher-Ogden Director of Engineering at Netflix

Tracks

Wednesday Jun 10

Applied Data Science and Machine Learning

Putting your data to use. The latest production methods for deriving novel insights
Engineer Your Culture

Building and scaling a compelling engineering culture
Modern Advances in Java Technology

Tips, techniques and technologies at the cutting edge of modern Java
Monoliths to Microservices

How to evolve beyond a monolithic system -- successful migration and implementation stories
The Art of Software Design

Software Arch as a craft, scenario based examples and general guidance
Sponsored Solutions Track I

Thursday Jun 11

Emerging Technologies in Front-end Development

The state of the art in client-side web development
Fraud Detection and Hack Prevention

Businesses are built around trust in systems and data. Securing systems and fighting fraud throughout the data in them.
Reactive Architecture Tactics

The how of the Reactive movement: Release It! techniques, Rx, Failure Concepts, Throughput, Availability
Architecting for Failure

War stories and lessons learned from building highly robust and resilient systems
High Performance Streaming Data

Scalable architectures and high-performance frameworks for immediate data over persistent connections
Sponsored Solutions Track II

Friday Jun 12

Architectures You've Always Wondered about

Learn from the architectures powering some of the most popular applications and sites
Continuously Deploying Containers in Production

Production ready patterns for growing containerization in your environment
Mobile and IoT at Scale

Users, Usage and Microservices
Modern Computer Science in the Real World

How modern CS tackles problems in the real world
Optimizing Yourself

Maximizing your impact as an engineer, as a leader, and as a person
Sponsored Solutions Track III

Schedule

Location:

Day of week:

Tracks

Wednesday Jun 10

Thursday Jun 11

Friday Jun 12

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Track: Architecting for Failure

Location:

Day of week:

Tracks

Wednesday Jun 10

Thursday Jun 11

Friday Jun 12

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World