Track: Stream Processing at Large

Location: Liberty, 8th fl.

Day of week: Tuesday

The software industry has learned that the world’s data can be represented as unbounded queues of changes. It can be sliced into sliding windows. It can be aggregated, rolled up, and analyzed. We can choose a number of ways to do this work such as using Kafka Streams or Spark Streaming. We can opt for Apache Beam, Storm, Samza, Flume, or Flink. We have a large pool of options on which we can build powerful systems, but there is accidental complexity lurking in any of the choices:

  • What if I need to rebuild all the data?
  • How do I know when my system is not healthy?
  • How do I reason about time in this system?
  • What if things arrive out of order?
  • How do I know things have arrived?

This track walks through uses of streaming technologies at large, the problems encountered, and how teams are coping with the state of this new world. As we approach maturity in streaming systems the companies using these systems are growing ecosystems and best practices around building and operating them. They are discovering new ways to reason about monitoring, testing, performance, and failure. This track is an opportunity to learn from their experiences.

Track Host:
Michelle Brush
Engineering Director @Cerner

Michelle Brush is a math geek turned computer geek with 15 years of software development experience. She has developed algorithms and data structures for pathfinding, search, compression, and data mining in embedded as well as distributed systems. In her current role as an Engineering Director for Cerner Corporation, she is responsible for the data ingestion and processing platform for Cerner’s Population Health solutions. She also leads several engineering education programs and culture initiatives including Cerner’s software architect development program and internal developer conference. Outside of Cerner, she is the chapter leader for the Kansas City chapter of Girl Develop It and one of the conference organizers for Midwest.io.

Trackhost Interview

Question: 
QCon: So who is the audience for the track that you're hosting?
Answer: 

Michelle: This track is for someone that has some exposure to streaming systems either because they're working with one, and they have buyers’ remorse, and they’re asking themselves if they picked the right streaming engine or for an architect or software engineer who is currently dealing with a batch system and thinking about moving to streaming. This track will give attendees an idea of what's in store when it comes to streaming.

Question: 
QCon: What do you hope someone walks away from this track with?
Answer: 

Michelle: First, that they need a schema registry! Besides that, getting an awareness of how much complexity is still left after you've made the initial decision of which framework or engine you're going to pick. Streaming as a whole is not in a state yet that's anywhere near consumable or usable, which is evidenced by the constant explosion of new, thrilling frameworks still being built out right now. It's getting much better, but it's still far from being solved.

Question: 
QCon: What questions will your track answer for attendees?
Answer: 

Michelle: Streaming is the direction that I see most architectures going in the future. If you're in Microservices you're probably going to move to streams with a combination of “functions as a service.” If you're in batch, demands for latency are going to increase and people are going to want more for feedback on their algorithms, analytics, and machine learning—faster

Whether or not you’re ready, you need to start thinking about the problems in this space and prepare. And again, it's not a solved space. There's challenges with back pressure, there's challenges with out of order updates and so on. You need to have a good reasoning model about how to deal with those challenges.

10:35am - 11:25am

by Anton Gorshkov
Managing Director @GoldmanSachs

How good is your streaming framework at failure? Does it die gracefully telling you exactly at which point it died? Does it tell you why it died? Does it pick-up where it left off afterwards? Can it easily skip the "erroneous" portions of the stream? Do you always know what was processed and what wasn't? Does it even have to die when process, host, data-center fail?

In this talk we focus on "What Ifs" scenarios and how to evaluate and architect a streaming platform that has high level...

11:50am - 12:40pm

by Sean Cribbs
Software Engineer @Comcast

In the midst of building a multi-datacenter, multi-tenant instrumentation and visibility system, we arrived at stream processing as an alternative to storing, forwarding, and post-processing metrics as traditional systems do. However, the streaming paradigm is alien to many engineers and sysadmins who are used to working with "wall-of-graphs" dashboards, predefined aggregates, and point-and-click alert configuration.

Taking inspiration from REPLs, literate programming, and DevOps...

1:40pm - 2:30pm

by Gwen Shapira
System Architect @Confluent, PMC Member @Kafka, & Committer Apache Sqoop

In a world of microservices that communicate via unbounded streams of events, schemas are the contracts between the services. Having an agreed contract allows the teams developing those services to move fast, by reducing the risk involved in making changes. Yet delivering events with schema change in mind isn’t the common practice yet.

In this presentation, we’ll discuss patterns of schema design, schema storage and schema evolution that help development teams build better contracts...

2:55pm - 3:45pm

Open Space
4:10pm - 5:00pm

by Shriya Arora
Senior Data Engineer @Netflix

Streaming applications have historically been complex to design and implement because of the significant infrastructure investment. However, recent active developments in various streaming platforms provide an easy transition to stream processing, and enable analytics applications/experiments to consume near real-time data without massive development cycles.

This talk will cover the experiences Netflix’s Personalization Data team had in stream processing unbounded datasets. The...

5:25pm - 6:15pm

by Michael Hansen
Principal Data Engineer @hbcdigital

​“Perfect is the enemy of good” ​ ​ -​ ​Voltaire

On the journey through life, we learn and adapt via trial and error - software development is no different. We realize and accept that we won’t build the perfect solution the first time around, it takes many iterations. At Gilt.com, now part of HBC Digital, we started processing and streaming event data nearly 5 years ago. Our initial solution was dramatically different from our current solution - and will likely be different from our...

Tracks

Monday, 26 June

Tuesday, 27 June

Wednesday, 28 June