Facilitating High Availability Via Systematic Capacity Planning

Location:

Grand Ballroom - Salon D

Track:

Scalability, Availability, and Performance : The Dark Arts

Abstract:

High availability is one of the key features of web-scale distributed systems. It has become even more paramount with mobile computing becoming increasingly ubiquitous and ever decreasing latency tolerance of the end-users. One of the primary aspects of delivering high availability is systematic and rigorous capacity planning. The latter is non-trivial as underestimation of capacity requirements would adversely impact end-user customer experience, thereby impacting business; in contrast, overestimation of capacity requirements would result in ballooning high operational costs, thereby impacting business. Further, in case of services such as Twitter, the event-driven nature (where event occurrence is not know known a priori) of the service makes capacity planning very challenging. To this end, at Twitter, we developed a systematic and statistically rigorous approach for capacity planning. In particular, we derived insights from historical time series to estimate, say, for example, traffic for upcoming events. We shall walk through a concrete example in the talk about how we went about capacity planning for Superbowl 2013. Inspite of the blackout at Superbowl 2013, the capacity deployed, based on the approach we developed, seamlessly handled the 'additional' traffic. We validated our capacity projections post-Superbowl 2013.

Arun Kejariwal

Aarun Kejariwal is currently a Staff Capacity Engineer at Twitter where he works on research and development of novel techniques to improve the accuracy of capacity models and demand forecasts. Prior to joining Twitter, @arun_kejariwal worked on research and development of practical and statistically rigorous methodologies to deliver high performance, availability and scalability in large scale distributed clusters. Some of the techniques developed have been published in peer-reviewed international conferences/journals.

Bryce Yan

Bryce Yan is currently a Staff Capacity Engineer at Twitter where he works on various techniques to improve the accuracy of Twitter's capacity models and demand forecasts. Prior to joining Twitter, @bryce_yan managed or worked in software development, performance testing, QA, database engineering, and ops in many companies large and small including Marketo, Salesforce.com, and Siebel Systems.@bryce_yan received his Bachelor's degree in physics at Berkeley and did his PhD work in quantum field theories at UCLA