Conference: Jun 26-28, 2017
Workshops: Jun 29-30, 2017
Track: Better than Resilient: Antifragile
Location:
- Salon D
Day of week:
- Tuesday
“Antifragility is beyond resilience or robustness. The resilient resists shocks and stays the same; the antifragile gets better.”
Failure and change are constants in Internet scale companies. Uptime is a battle, with tales of glory and heartache. How can we do better than withstand, but improve with each step? Learn from industry leaders how they proactively prepare for the inevitable. Find out how these techniques help them to weather production storms and have confidence in the behavior of their complex systems.

by Theo Schlossnagle
Founder and CEO @Circonus, Editorial board of ACM's ‘Queue’
In this presentation, I'll talk about lessons learned in building a always-on distributed time-series database with aggressive quality of service guarantees. As any distributed systems engineer knows, coping with a failed machine is an easy problem compared to an under performing one. When SLAs are tight, under performing is effectively byzantine behavior. I will talk about both macro and micro techniques used in our system to cope with bad machines, bad actors...
by Luke Kosewski
Founding Member of Netflix Chaos and Traffic Team
The Netflix control plane handles a third of peak Internet traffic. That's an awful lot of customers we need to keep safe from any service outages. Netflix developed "Flow" to wage war against these outages. Flow coordinates recovery from localized disruptions and enables periodic verification through production experimentation called “Chaos Kong.”
Flow endows all services within Netflix the capabilities to withstand regional...
by Michalis Zervos
Service Resilience Software Engineer @Microsoft
For any company to run on the cloud they need assurances that their workloads, services, and data will be always available and secure. To be able to provide such guarantees, application developers and cloud providers need to perform extensive verification across a number of distributed services. Traditional testing tools were not designed to verify the resiliency of such systems.
At Microsoft, we actively develop and use fault...
by Richard Kasperowski
Author of The Core Protocols: A Guide to Greatness
Open Space
by Abel Mathew
Co-founder & CEO of Backtrace I/O
Resilience for many of us comes from our ability to restart applications in the face of failure. We as debuggers and operators are often forced to go back and analyze clues left behind to tease out root-cause from assets like logs, heap dumps, or even core dumps. As our systems grow, and become more distributed, these one-off investigations become less tenable and a scalable way to analyze incidents after-the-fact is needed. In this talk, we'll explore examples...
by Thomissa Comellas
Technical Project Manager @Dropbox
by Tammy Butow
SRE Manager @Dropbox
Thomissa joined the Dropbox Infrastructure team 100 days ago. This presentation will share her experiences developing and rolling out new Disaster Recovery Testing techniques at Dropbox. Tammy will join Thomissa to share how her team runs DRTs and has implemented the techniques Thomissa has evangelized.
Dropbox was founded by engineers, and the ethos of technical innovation is fundamental to our culture. We’ve grown enormously...
Tracks
Monday, 13 June
-
Architectures You've Always Wondered About
Case studies from: Google, Linkedin, Alibaba, Twitter, and more...
-
Stream Processing @ Scale
Technologies and techniques to handle ever increasing data streams
-
Culture As Differentiator
Stories of companies and team for whom engineering culture is a differentiator - in delivering faster, in attracting better talent, and in making their businesses more successful.
-
Practical DevOps for Cloud Architectures
Real-world lessons and practices that enable the devops nirvana of operating what you build
-
Incredible Power of an Open-Sourced .NET
.NET is more than you may think. From Rx to C# 7 designed in the open, learn more about the power of open source .NET
-
Sponsored Solutions Track 1
Tuesday, 14 June
-
Better than Resilient: Antifragile
Failure is a constant in production systems, learn how to wield it to your advantage to build more robust systems.
-
Innovations in Java and the Java Ecosystem
Cutting Edge Java Innovations for the Real World
-
Modern CS in the Real World
Real-world Industry adoption of modern CS ideas
-
Containers: From Dev to Prod
Beyond the buzz and into the how and why of running containers in production
-
Security War Stories
Expert-level security track led by well known and respected leaders in the field
-
Sponsored Solutions Track 2
Wednesday, 15 June
-
Microservices and Monoliths
Practical lessons on services. Asks the question when and when to NOT go with Microservices?
-
Modern API Architecture - Tools, Methods, Tactics
API-based application development, and the tooling and techniques to support effectively working with APIs in the small or at scale. Using internal and external APIs
-
Commoditized Machine Learning
Barriers to entry for applied ML are lower than ever before, jumpstart your journey
-
Full Stack JavaScript
Browser, server, devices - JavaScript is everywhere
-
Optimizing Yourself
Keeping life in balance is always a challenge. Learning lifehacks
-
Sponsored Solutions Track 3