Track:

Location:

Salon D

Duration

Duration:

10:35am - 11:25am

Day of week:

Tuesday

Level:

Advanced

Persona:

Developer

Key Takeaways

Learn concrete approaches to discovering, analyzing, and optimizing poorly performing or latent nodes in a cluster.
Hear about tools and techniques that can be used to capture and model behavior.
Understand lessons from building applications to model performance on distributed systems.

Abstract

In this presentation, I'll talk about lessons learned in building a always-on distributed time-series database with aggressive quality of service guarantees. As any distributed systems engineer knows, coping with a failed machine is an easy problem compared to an under performing one. When SLAs are tight, under performing is effectively byzantine behavior. I will talk about both macro and micro techniques used in our system to cope with bad machines, bad actors and other poorly qualified badness. Most are adaptive techniques backed with both local and cluster-wide statistical analysis of observed behavior.

Interview

Question:

What’s the motivation for your talk?

Answer:

At Circonus, we have some innovative approaches to managing a distributed system’s performance by leveraging the new resiliency models available on those systems. They are somewhat novel and they are pretty accessible to people who run general distributed systems (like Cassandra, Riak, etc.). I will be talking about our proprietary system but they apply to all distributed systems. I think that people will be given some interesting ideas on how they can manage their large distributed databases.

Question:

Can you describe one of the techniques you will go into?

Answer:

There are some pretty traditional methods of doing active feedback between nodes for distributed system cluster performance. These are approaches you might use when you notice another node is slow, latent, or just not up to date. For example, you might take it out of rotation or change your parameters regarding node-to-node interaction. But the idea of measuring a node’s resource performance at a highly granular level (or being able to actually turn off nodes that have different performance profiles in an effort to understand if you have better performance across the whole cluster when you do) is the conclusion of the talk.

I will go over some standard techniques of getting performance characteristics off of replications systems, and I will also go on to talk about measuring per transaction latency on low level system resources. For example, measuring the latency for IOPS on every spindle on every node in your cluster and then being able to model that and elect to immobilize machines based on bad behavior.

Question:

When you say model latency on every IOPS on a spindle, are you going to be talking about a specific tool to help you model or general ideas and approaches? Assuming you’ll discuss tooling, is that tool open source and available for people to use?

Answer:

I am going to talk about the general idea and the general outcome because I think it is applicable to a lot of people, but I will describe exactly how we do it. When I describe how we do it, I plan to discuss the tools that we use.

The tool that we use to collect and get all of that information is all open source, and the models that we use to detect the behavior are all open. But we built a monitoring tool. So we actually pump the data through our own product to do the actual modeling. With that said, the techniques and the missing parts that are closed source are very small. They are actually rather simple concepts that others can build on.

Question:

How would you rate this talk: Beginner, Intermediate, or Advanced?

Answer:

I think that beginners might be a little overwhelmed, so intermediate and advanced will really understand the concepts and the approaches. I think advanced users will likely leave with enough information to implement something like this in their own environment.

Speaker: Theo Schlossnagle

Founder and CEO @Circonus, Editorial board of ACM's ‘Queue’

Theo founded Circonus in 2010, and continues to be its principal architect. After earning undergraduate and graduate degrees from Johns Hopkins University in computer science with a focus on graphics and randomized algorithms in distributed systems, he went on to research resource allocation techniques in distributed systems during four years of post-graduate work. A widely respected industry thought leader, Theo is the author of Scalable Internet Architectures (Sams) and a frequent speaker at worldwide IT conferences. Theo is a computer scientist in every respect. Theo is a member of the IEEE and a senior member of the ACM. He serves on the editorial board of the ACM's Queue Magazine.

Find Theo Schlossnagle at

Speaker page

@postwait

Founder and CEO at Circonus

Similar Talks

Learnings from a Culture First Startup

CTO @Buffer

Sunil Sadasivan

Becoming an Outlier

Software Architect @VinSolutions, Author @pluralsight

Cory House

ESPN Next Generation APIs Powering Web, Mobile, TV

Senior Director of Distribution Platforms @ESPN

Manny Pelarinos

The Human Side of Microservices

Tech Lead @Yelp

John Billings

The Seven (More) Deadly Sins of Microservices

Chief Scientist @OpenCredo

Daniel Bryant

Lessons Learned on Uber's Journey into Microservices

Software Engineer @Uber

Emily Reinhold

What They Don’t Tell You About Microservices…

CTO @Yodle

Daniel Rolnick

Algorithms for Animation

Partner & Tech Lead @CarbonFive

Courtney Hemphill

Machine Learning Fast and Slow

Lead Data Scientist @betaworks

Suman Deb Roy

Tracks

Monday, 13 June

Architectures You've Always Wondered About

Case studies from: Google, Linkedin, Alibaba, Twitter, and more...
Stream Processing @ Scale

Technologies and techniques to handle ever increasing data streams
Culture As Differentiator

Stories of companies and team for whom engineering culture is a differentiator - in delivering faster, in attracting better talent, and in making their businesses more successful.
Practical DevOps for Cloud Architectures

Real-world lessons and practices that enable the devops nirvana of operating what you build
Incredible Power of an Open-Sourced .NET

.NET is more than you may think. From Rx to C# 7 designed in the open, learn more about the power of open source .NET
Sponsored Solutions Track 1

Tuesday, 14 June

Better than Resilient: Antifragile

Failure is a constant in production systems, learn how to wield it to your advantage to build more robust systems.
Innovations in Java and the Java Ecosystem

Cutting Edge Java Innovations for the Real World
Modern CS in the Real World

Real-world Industry adoption of modern CS ideas
Containers: From Dev to Prod

Beyond the buzz and into the how and why of running containers in production
Security War Stories

Expert-level security track led by well known and respected leaders in the field
Sponsored Solutions Track 2

Wednesday, 15 June

Microservices and Monoliths

Practical lessons on services. Asks the question when and when to NOT go with Microservices?
Modern API Architecture - Tools, Methods, Tactics

API-based application development, and the tooling and techniques to support effectively working with APIs in the small or at scale. Using internal and external APIs
Commoditized Machine Learning

Barriers to entry for applied ML are lower than ever before, jumpstart your journey
Full Stack JavaScript

Browser, server, devices - JavaScript is everywhere
Optimizing Yourself

Keeping life in balance is always a challenge. Learning lifehacks
Sponsored Solutions Track 3

See the Full Schedule

Location:

Duration

Day of week:

Level:

Persona:

Key Takeaways

Abstract

Interview

Find Theo Schlossnagle at

Similar Talks

Tracks

Monday, 13 June

Tuesday, 14 June

Wednesday, 15 June

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Presentation: Adaptive Availability for Quality of Service

Location:

Duration

Day of week:

Level:

Persona:

More talks on:

Key Takeaways

Abstract

Interview

Find Theo Schlossnagle at

Similar Talks

Tracks

Monday, 13 June

Tuesday, 14 June

Wednesday, 15 June

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World