Conference: Jun 26-28, 2017
Workshops: Jun 29-30, 2017
Presentation: Improving Resilience by Creating Storms in the Cloud
Location:
- Salon D
Duration
Day of week:
- Tuesday
Level:
- Intermediate
Persona:
- Architect
Key Takeaways
- Understand that fault injection is critical in order to build and maintain highly-available cloud services. It’s also a great tool to train on-call engineers
- Learn the basic principles on designing, developing and using a fault injection system
- Discover the problems in a timely manner and in a controlled environment without affecting the customers
Abstract
For any company to run on the cloud they need assurances that their workloads, services, and data will be always available and secure. To be able to provide such guarantees, application developers and cloud providers need to perform extensive verification across a number of distributed services. Traditional testing tools were not designed to verify the resiliency of such systems.
At Microsoft, we actively develop and use fault injection to test and break our services. By doing this we identify failure points, design better detection, and build mitigations which allow us to auto-heal when real issues arise. Fault injection can span the whole stack: from applications to hardware and the network, from VMs to datacenters.
Developing a fault injection system is tricky, but utilizing it effectively is a magnitude of order harder. It’s important for engineers to embrace the fault injection culture and be trained to leverage it in all phases of development and maintenance. Only then can we significantly reduce the TTD/TTM (time-to-detect/mitigate). We will present our learnings for designing and using fault injection systems to maintain highly-available cloud-scale applications and the cultural change necessary to enable them.
Interview
Similar Talks


Tracks
Monday, 13 June
-
Architectures You've Always Wondered About
Case studies from: Google, Linkedin, Alibaba, Twitter, and more...
-
Stream Processing @ Scale
Technologies and techniques to handle ever increasing data streams
-
Culture As Differentiator
Stories of companies and team for whom engineering culture is a differentiator - in delivering faster, in attracting better talent, and in making their businesses more successful.
-
Practical DevOps for Cloud Architectures
Real-world lessons and practices that enable the devops nirvana of operating what you build
-
Incredible Power of an Open-Sourced .NET
.NET is more than you may think. From Rx to C# 7 designed in the open, learn more about the power of open source .NET
-
Sponsored Solutions Track 1
Tuesday, 14 June
-
Better than Resilient: Antifragile
Failure is a constant in production systems, learn how to wield it to your advantage to build more robust systems.
-
Innovations in Java and the Java Ecosystem
Cutting Edge Java Innovations for the Real World
-
Modern CS in the Real World
Real-world Industry adoption of modern CS ideas
-
Containers: From Dev to Prod
Beyond the buzz and into the how and why of running containers in production
-
Security War Stories
Expert-level security track led by well known and respected leaders in the field
-
Sponsored Solutions Track 2
Wednesday, 15 June
-
Microservices and Monoliths
Practical lessons on services. Asks the question when and when to NOT go with Microservices?
-
Modern API Architecture - Tools, Methods, Tactics
API-based application development, and the tooling and techniques to support effectively working with APIs in the small or at scale. Using internal and external APIs
-
Commoditized Machine Learning
Barriers to entry for applied ML are lower than ever before, jumpstart your journey
-
Full Stack JavaScript
Browser, server, devices - JavaScript is everywhere
-
Optimizing Yourself
Keeping life in balance is always a challenge. Learning lifehacks
-
Sponsored Solutions Track 3