Track: Stream Processing @ Scale

Location:

Day of week:

Stream processing has become the most dynamic, fastest growing area of the big data space. As streaming applications become more mainstream, technologies to build those applications become more mature and feature rich. For example, Apache Spark 2.0 introduced better support for streaming use cases, Google donated their DataFlow API to the Apache Foundation, and Apache Flink was built from the ground up to support streaming.

Is it possible to process unbounded streams of data with near real-time low latency, while also providing the accuracy, consistency and flexibility that modern businesses demand? This track focuses on the latest developments in processing unbounded streams of data and how to build robust, scalable data pipeline in practice with the latest tools, technologies and ideas.

Track Host:
Eugene Dvorkin
Technical Architect & NYC Storm User Group Organizer
Eugene Dvorkin is a software architect, developer and mentor specializing in Streaming Data. He is active member of engineering community in New York City and organizer of Storm User Group meetup. Eugene has experience in areas such as microservice architecture, containers, continuous delivery, big data. Currently he is leading team of data engineers to build a data pipeline at ADP.
10:35am - 11:25am

by Andrew Psaltis
IoT/Data Architect working with Streaming Systems @Hortonworks

Our needs for real-time data are growing at an unprecedented rate; it is only a matter of time before you will be faced with building a real-time streaming pipeline. Often a major key decision you would need to quickly make is which stream-processing framework should you use. What if instead you could use a unified API that allows you to express complex data processing workflows, including advanced windowing and event timing and aggregate computations? Apache Beam aims to provide this...

11:50am - 12:40pm

by Ted Malaska
Committer to Flume, Avro, Pig, YARN & Architect @Cloudera

by Pat Patterson
Community Champion @StreamSets & Lecturer California State University, Monterey Bay

A lot has changed and a lot has stayed the same with Ingest and Stream Processing over the years. But today there are many options than even for Ingest and Stream Processing that one may wonder why one solution versus the other. The problem is that in this space, one size does not fit all, and that makes it all the more confusing. This talk aims at giving the audience a direction to choose when it comes to Ingest and Stream Processing.

...
1:40pm - 2:30pm

by Richard Kasperowski
Author of The Core Protocols: A Guide to Greatness

Open Space
2:55pm - 3:45pm

by Sean T. Allen
VP Engineering @Sendence

How Did I Get Here? Building Confidence in a Distributed Stream Processor

When we build a distributed application, how do we have confidence that our results are correct? We can test our business logic over and over but if the engine executing it isn't trustworthy, we can't trust our results.

How can we build trust in our execution engines? We need to test them. It's hard...

4:10pm - 5:00pm

by Neha Narkhede
Co-Creator Apache Kafka/Co-Founder & Head of Engineering @Confluent

Most applications continuously transform streams of inputs into streams of outputs. Yet the idea of directly modeling stream processing in applications is just coming into it's own after a few decades on the periphery.

This talk will cover the basic challenges of reliable, distributed, stateful stream processing. It will cover how Apache Kafka was designed to support capturing and processing distributed data streams by building...

5:25pm - 6:15pm

by Igor Maravić
Software Engineer @Spotify

by Neville Li
Software Engineer @Spotify

Spotify’s data is increasing at a rate of 60 billion events per day. The current event delivery system, which is based on Kafka 0.7, is slowly but certainly reaching its limitations. To be able to seamlessly scale the event delivery system with Spotify’s growth, we decided to base the new event delivery system on Google Cloud Pubsub and Google Cloud Dataflow.

Spotify’s event delivery system is one of the foundational pieces of...

Tracks

Monday, 13 June

Tuesday, 14 June

Wednesday, 15 June