Track:

Location:

Salon E

Duration

Duration:

1:40pm - 2:30pm

Day of week:

Tuesday

Level:

Advanced

Persona:

Developer

Key Takeaways

Hear practical applications of academic research with a large scale distributed system.
Discuss membership and dissemination protocols and understand their application in practice.
Learn practical lessons applying membership and dissemination protocols to a large scale distributed system.

Abstract

We are building an instrumentation platform that runs across dozens of datacenters to provide operational visibility for internal systems and applications. This platform must remain up as much as possible and allow support and operations staff to understand and diagnose problems quickly. They must be able to ask questions like "what machines and applications are publishing metrics?", "what systems appear to be offline?", "what order did these errors occur in?", all without consulting every datacenter. Furthermore, they must be able to change configuration quickly, with confidence that every affected system will receive and act upon it.

To help with these problems, we are implementing several recently developed protocols for cluster membership, epidemic broadcast, and monotonic time. Respectively, these protocols allow us to know what nodes are peers, to disseminate configuration and status information, and to agree on roughly relative orders of events. Best of all, they are all synchronization-free, meaning we can achieve our goals while remaining highly available. In this talk, we'll discuss the protocols we chose, challenges to implementing them, and some preliminary results from deploying the protocols across our infrastructure.

Interview

Question:

You recently moved from Basho to Comcast. Can you tell me about your role at Comcast?

Answer:

I am a software engineer at Comcast, and I am starting a new team as the tech lead on that team. For the last year, I have been working on an API management proxy that we wrote in-house, in Nginx and Lua. I am going to use some of my previous experience with distributed systems at Basho in this new project. It’s going to be on the Erlang OTP platform, and we are going to be using some of the technologies that I am covering in my talk.

Question:

Lua, Erlang OTP... Sounds like quite a polyglot environment. Tell me more about what you’re doing there?

Answer:

It is predominantly JVM stuff inside Comcast, but I also work on a team that is underneath Jon Moore. His teams are focusing on building services that are used by the rest of the company, the products behind the products. His focus there is to leverage open source, and build our own where we can’t use open source or where it is not sufficient, to try to reduce risk, increase value, and reduce costs.

For the most part, we are building managed services that are consumed by other teams within Comcast to build the products. We’ve got a DynamoDB clone. We have the API management proxy I mentioned. We have a DNS-as-a-service that is spinning up and then my new project.

Question:

Will you talk us through the motivation for this talk?

Answer:

The track is Modern CS in the Real World and I want to talk about the research that we are applying in my new project. Specifically, I will be covering getting information out from a big cluster spread across multiple data centers. There are three aspects to that, that I am intending to talk about.

The first aspect is membership, what machines are part of the cluster. That’s an interesting problem: because in very large clusters if you let every node know what every node is you will get a lot of churn, especially if they come and go.

As long as there is a path from one node to every other node, you don’t have to have a full view of membership. So we are looking at some protocols that use partial views of the entire membership, and they communicate only with their peers that are in their list, and those peers know other peers that they don’t know about. That’s how you get a fully routable mesh to the cluster.

Question:

What do you consider to be the right persona for your talk?

Answer:

I will be talking primarily to architects, people who are managing or building large scale services that span multiple data centers, especially where they need to have high availability requirements.

If you don’t need to have high availability, there are lots of other things that you can just pull off the shelf and use, rather than trying to implement one of these protocols.

Question:

How would you class this, as an intermediate or advanced talk?

Answer:

It’s intermediate to advanced, leaning toward advanced. I am going to make sure that it is approachable, but if you are not experienced, it might be pretty hard to motivate this sort of exploration of research. This is partly because the problems you try to solve haven’t been as big, or because you haven’t had to run a huge system before.

It’s targeted at a lot narrower group than maybe some of the other talks that are more general.

Question:

Tell me about how you are going to approach the talk?

Answer:

In each of the three areas, I am going to talk briefly about the research involved, without going into too much detail. Then I will talk about implementation and some of the pitfalls we ran into.

Question:

Is there some open source code the people can use for this protocol?

Answer:

There are some implementations in the wild that are open source. Recently, Mesosphere released their DCOS, a high level management tool for Mesos and containers. One of the components that they released is called Lashup, and it implements the membership protocol or a variation of the membership protocol that I am looking at.

I don’t think it implements the dissemination protocol, which is the second piece.

The dissemination protocol was implemented first at Basho, and then extracted by the folks at Helium, who have an IoT platform that is using it. It is called Plumtree. We are going to do a variation on that.

Question:

What are the main takeaways from your talk?

Answer:

The first is the usefulness of research, applying it to distributed systems. Then it’s a discussion on the membership and dissemination protocols, and the open solutions that exist out there. I’ll add some of the lessons we learned going through research, and applying these protocols in production.

Speaker: Sean Cribbs

Software Engineer @Comcast

Sean Cribbs is a distributed systems and web architecture enthusiast, currently building innovative cloud and infrastructure software at Comcast Cable. Previously, Sean spent five years with Basho Technologies contributing to nearly every part of Riak including client libraries, CRDTs and tools. In his free time, he has ported Basho’s Webmachine HTTP server toolkit from Erlang to Ruby, created a popular parser-generator for Erlang, and has contributed to many other open-source projects, including Chef, Homebrew, and Radiant CMS.

Find Sean Cribbs at

Speaker page

@seancribbs

Software Engineer

Similar Talks

Learnings from a Culture First Startup

CTO @Buffer

Sunil Sadasivan

Becoming an Outlier

Software Architect @VinSolutions, Author @pluralsight

Cory House

ESPN Next Generation APIs Powering Web, Mobile, TV

Senior Director of Distribution Platforms @ESPN

Manny Pelarinos

The Human Side of Microservices

Tech Lead @Yelp

John Billings

The Seven (More) Deadly Sins of Microservices

Chief Scientist @OpenCredo

Daniel Bryant

Lessons Learned on Uber's Journey into Microservices

Software Engineer @Uber

Emily Reinhold

What They Don’t Tell You About Microservices…

CTO @Yodle

Daniel Rolnick

Algorithms for Animation

Partner & Tech Lead @CarbonFive

Courtney Hemphill

Machine Learning Fast and Slow

Lead Data Scientist @betaworks

Suman Deb Roy

Tracks

Monday, 13 June

Architectures You've Always Wondered About

Case studies from: Google, Linkedin, Alibaba, Twitter, and more...
Stream Processing @ Scale

Technologies and techniques to handle ever increasing data streams
Culture As Differentiator

Stories of companies and team for whom engineering culture is a differentiator - in delivering faster, in attracting better talent, and in making their businesses more successful.
Practical DevOps for Cloud Architectures

Real-world lessons and practices that enable the devops nirvana of operating what you build
Incredible Power of an Open-Sourced .NET

.NET is more than you may think. From Rx to C# 7 designed in the open, learn more about the power of open source .NET
Sponsored Solutions Track 1

Tuesday, 14 June

Better than Resilient: Antifragile

Failure is a constant in production systems, learn how to wield it to your advantage to build more robust systems.
Innovations in Java and the Java Ecosystem

Cutting Edge Java Innovations for the Real World
Modern CS in the Real World

Real-world Industry adoption of modern CS ideas
Containers: From Dev to Prod

Beyond the buzz and into the how and why of running containers in production
Security War Stories

Expert-level security track led by well known and respected leaders in the field
Sponsored Solutions Track 2

Wednesday, 15 June

Microservices and Monoliths

Practical lessons on services. Asks the question when and when to NOT go with Microservices?
Modern API Architecture - Tools, Methods, Tactics

API-based application development, and the tooling and techniques to support effectively working with APIs in the small or at scale. Using internal and external APIs
Commoditized Machine Learning

Barriers to entry for applied ML are lower than ever before, jumpstart your journey
Full Stack JavaScript

Browser, server, devices - JavaScript is everywhere
Optimizing Yourself

Keeping life in balance is always a challenge. Learning lifehacks
Sponsored Solutions Track 3

See the Full Schedule

Location:

Duration

Day of week:

Level:

Persona:

Key Takeaways

Abstract

Interview

Find Sean Cribbs at

Similar Talks

Tracks

Monday, 13 June

Tuesday, 14 June

Wednesday, 15 June

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Presentation: Membership, Dissemination and Population Protocols

Location:

Duration

Day of week:

Level:

Persona:

More talks on:

Key Takeaways

Abstract

Interview

Find Sean Cribbs at

Similar Talks

Tracks

Monday, 13 June

Tuesday, 14 June

Wednesday, 15 June

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World