What You’ll Learn

Learn how to create a literate interface for stream queries.
Understand how Elixir can be used with React to build interactive interfaces.
Discover how stream processing can be applied to operational visibility problems.

Abstract

In the midst of building a multi-datacenter, multi-tenant instrumentation and visibility system, we arrived at stream processing as an alternative to storing, forwarding, and post-processing metrics as traditional systems do. However, the streaming paradigm is alien to many engineers and sysadmins who are used to working with "wall-of-graphs" dashboards, predefined aggregates, and point-and-click alert configuration.

Taking inspiration from REPLs, literate programming, and DevOps practices, we've designed an interface to our instrumentation system that focuses on interactive feedback, note-taking, and team communication. An engineer can both experiment with new flows at low risk, and codify longer-term practices into runbooks that embed live visualizations of instrumentation data. As a result, we can start to free our users from understanding the mechanics of the stream processor and instead focus on the domain of instrumentation.

In this talk, we will discuss how the interface described above works, how the stream processor manages flows on behalf of the user, and some tradeoffs we have encountered while preparing the system to roll out into our organization.

Interview

Question:

QCon: What is the difference between stream processing and aggregation?

Answer:

Cribbs: With my background working on distributed databases I realized writing to disk first and then trying to process it afterwards is really expensive. It's fine for generating business intelligence afterwards, or if you want to aggregate ahead of time before you write to disk it's fine. But if you're trying to look at an individual, like a single customer who has had a bad experience, aggregates don't help you. You just want to look at the very focused events that are related to that person's bad experience. I think that stream processing is a really good fit for that sort of use case, finding the really tiny needles in a haystack.

That said people don't necessarily know how to write incremental computations over instrumentation data. How do we make it easier for people to get into this, not just to get started but to keep developing their understanding of their own data as it goes through the stream processing system?

So you have all this data flowing in: what you do with it, do you even know what's there? In some cases you do because you wrote their instrumentation for the application and it's emitting data like a request span with the latency measurement, but I think there is still a barrier to entry. Instead of thinking of it procedurally, you have to be thinking of it as a stream, and do things with windows and across different dimensions. It's an unfamiliar problem to a lot of people.

So we wanted to create a really guided interface with a lot of interactivity and contextual help, and the ability to write down what you learned when you wrote that little program, and to provide context for the next person who picks it up.

Question:

QCon: Your abstract says you take inspiration from REPLs and then you mention about run books. Are you going to be talking about interacting with the data?

Answer:

Cribbs: So it's a web user interface. You write a document with markup in the middle of it: you have code blocks which represent individual stream pipelines. At the end of the pipeline something says display graph or show a table of the last 10 events that match this criterion. As you edit it you can run it and it will show you the matching live data stream.

Question:

QCon: It's like a run book - is it Python or a DSL that you're running?

Answer:

Cribbs: We have built up a lot of infrastructure in Elixir - it's the stuff that we're using under the hood, but we've built a pretty interesting DSL on top because we know that we can focus that on operational instrumentation data. At every stage of the stream we have a schema, so we can optimize around that.

Two of our engineers are really familiar with the Elixir ecosystem, myself one of them. But there's also some dynamism to it that makes it compelling: one of the great things that we can do is take that stream processing program in and then manipulate the AST and rewrite it and then put it into action.

Question:

QCon: What about the front end; what are you using on the front end?

Answer:

Cribbs: We're using some of the tools that come with Elixir, including a web framework called Phoenix, but a lot of it is React. So much of our stack is so different from rest the company, being based on Elixir. If we want people to come in and work on our project, Elixir is enough of a barrier, so we decided to pass on things like Elm or ClojureScript - let's just have plain JavaScript.

Our department does rotations regularly, every month basically. Senior level folks are not going to have a problem with new front end languages, but if we have not-so-senior people rotating in, it's easier to do JavaScript.

Question:

QCon: What's the level and who is the primary persona for your talk?

Answer:

Cribbs: This is an intermediate level talk. There will be some deep concepts and I would say a lot of it will be about how designed the user interface. The message is stream processing systems are awesome but you need to think about the end-user of those systems and make them accessible.

Speaker: Sean Cribbs

Software Engineer @Comcast

Sean Cribbs is a distributed systems and web architecture enthusiast, currently building innovative cloud and infrastructure software at Comcast Cable. Previously, Sean spent five years with Basho Technologies contributing to nearly every part of Riak including client libraries, CRDTs and tools. In his free time, he has ported Basho’s Webmachine HTTP server toolkit from Erlang to Ruby, created a popular parser-generator for Erlang, and has contributed to many other open-source projects, including Chef, Homebrew, and Radiant CMS.

Find Sean Cribbs at

Speaker page

@seancribbs

Software Engineer

Similar Talks

The Effective Remote Developer

Director of Engineering

David Copeland

Evaluating Machine Learning Models: A Case Study

Data Scientist @Opendoor

Nelson Ray

Drinking from the Elixir Fountain of Resilience

Senior Software Engineer @Comcast

Jearvon Dharrie

I Have A NoSQL toaster

Developer Advocate @Couchbase

Matthew Groves

Engineer Innovation Through Rapid Prototyping

Principal Software Engineer @ Vistaprint

Ramon Harrington

Production - Designing for Testability

Cofounder & CTO @Flow.io., previously Co-Founder & CTO @Gilt

Michael Bryzek

Nonconformist Resilience: DB-Backed Job Queues

VP Architecture @Betterment

John Mileham

Managing Millions of Data Services @Heroku

Senior Infrastructure Engineer @Heroku

Gabriel Enslein

Building Microservices @Squarespace

Director of Engineering @ Squarespace

Franklin Angulo

Tracks

Monday, 26 June

Microservices: Patterns & Practices

Practical experiences and lessons with Microservices.
Java - Propelling the Ecosystem Forward

Lessons from Java 8, prepping for Java 9, and looking ahead at Java 10. Innovators in Java.
High Velocity Dev Teams

Working Smarter as a team. Improving value delivery of engineers. Lean and Agile principles.
Modern Browser-Based Apps

Reactive, cross platform, progressive - webapp tech today.
Innovations in Fintech

Technology, tools and techniques supporting modern financial services.

Tuesday, 27 June

Architectures You've Always Wondered About

Case studies from the most relevant names in software.
Developer Experience: Level up Your Engineering Effectiveness

Trends, tools and projects that we're using to maximally empower your developers.
Chaos & Resilience

Failures, edge cases and how we're embracing them.
Stream Processing at Large

Rapidly moving data at scale.
Building Security Infrastructure

How our industry is being attacked and what you can do about it.

Wednesday, 28 June

Next Gen APIs: Designs, Protocols, and Evolution

Practical deep-dives into public and internal API design, tooling and techniques for evolving them, and binary and graph-based protocols.
Immutable Infrastructures: Orchestration, Serverless, and More

What's next in infrastructure. How cloud function like lambda are making their way into production.
Machine Learning 2.0

Machine Learning 2.0, Deep Learning & Deep Learning Datasets.
Modern CS in the Real World

Applied, practical, & real-world dive into industry adoption of modern CS.
Optimizing Yourself

Maximizing your impact as an engineer, as a leader, and as a person.
Ask Me Anything (AMA)

This Year's Schedule

Track: Stream Processing at Large

Location: Liberty, 8th fl.

Duration: 11:50am - 12:40pm

Day of week: Tuesday

Level: Intermediate

Persona: Data Scientist

What You’ll Learn

Abstract

Interview

Find Sean Cribbs at

Similar Talks

Tracks

Monday, 26 June

Tuesday, 27 June

Wednesday, 28 June

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Presentation: Adopting Stream Processing for Instrumentation

Track: Stream Processing at Large

Location: Liberty, 8th fl.

Duration: 11:50am - 12:40pm

Day of week: Tuesday

Level: Intermediate

Persona: Data Scientist

More talks on:

What You’ll Learn

Abstract

Interview

Find Sean Cribbs at

Similar Talks

Tracks

Monday, 26 June

Tuesday, 27 June

Wednesday, 28 June

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World