Location:

Level: 
Intermediate
9:00am - 12:00pm

Prerequisites

Participants should be comfortable with Java, but do not need to have any experience with Kafka. A laptop with VirtualBox or VMWare is recommended to work through the example application. VM images with all the required software pre-installed will be provided.

Tutorial: Capturing and processing streaming data with Apache Kafka

This half-day tutorial will describe how to use Apache Kafka to store and process streaming data -- anything from user activity and app metrics to device instrumentation and logs. Learn about Kafka's core abstractions and how to interact with Kafka using the Kafka clients, called producers and consumers. With these tools, it's easy to design applications as a set of loosely coupled services exchanging data via Kafka.

The introduction will describe Kafka's unifying abstraction, a partitioned and replicated low-latency commit log, and how it can be applied to several types of applications. The majority of the tutorial will focus on building an end-to-end application that performs simple anomaly detection on user activity data. A front-end application will be instrumented with a Kafka producer to report activity data to a Kafka topic. We'll discuss how to use schemas, and Avro specifically, to ensure downstream consumers agree on the data format and allow for safe, robust evolution of the data format. Then, to detect anomalies, we'll build a distributed, fault tolerant service using Kafka's consumer group abstraction to process the data.

Tracks

Wednesday Jun 10

Thursday Jun 11

Friday Jun 12