Canary Analyze All The Things: How we learned to Keep Calm and Release Often
Canary Analyze All The Things: How we learned to Keep Calm and Release Often
The process of releasing to production can be nerve wracking for any conscientious developer, especially when the product you're releasing is responsible for entertaining 48 million customers. Practically everyone who pushes to production spends some time after that push monitoring production, sometimes with a certain degree of trepidation.
In the last year or so, we've taken our most critical application, the system responsible for the Netflix API, and increased deployment cadence from semi-monthly deployments to daily deployments, all while lowering the effort on the part of developers, increasing availability for our customers, and building up our trust that when we deploy into production, that deployment is safe, predictable, and good for our customers -- and that when it is not, we'll know it isn't, we'll know it quickly, and we'll automatically revert changes. We've done this by investing in our real-time analytics capabilities and building an automated canary analysis system.
In this talk, we'll discuss canary analysis deployment and observability patterns we believe are generally useful, and talk about the difference between manual and automated canary analysis. Partially aspirational, partially utiliarian, our goal is to provide a useful way to think about canary analysis that will be applicable in most cloud-based engineering environments. We'll also discuss cloud-specific considerations (and opportunities) for canary analysis, as well as the next steps for Netflix's canary analysis system.