Workshop: Apache Spark, Kafka-based Recommendation Pipeline

Location:

Level: 
Intermediate
9:00am - 4:00pm

Date:

Fri, 17 Jun

Prerequisites

Basic familiarity with Unix/Linux commands; Experience in SQL, Java, Scala, Python, or R; Basic familiarity with linear algebra concepts (dot product); - Laptop with ssh client and a modern browser

The goal of this workshop is to build an end-to-end, streaming data analytics and recommendations pipeline on your local machine using Docker and the latest streaming analytics tools.

First, we create a data pipeline to interactively analyze, approximate, and visualize streaming data using modern tools such as Apache Spark, Kafka, Zeppelin, iPython, and ElasticSearch.

Next, we extend our pipeline to use streaming data to generate personalized recommendation models using popular machine learning, graph, and natural language processing techniques such as collaborative filtering, clustering, and topic modeling.

Lastly, we productionize our pipeline and serve live recommendations to our users!

Attendees will learn how to:

  • Create a complete, end-to-end streaming data analytics pipeline
  • Interactively analyze, approximate, and visualize streaming data
  • Generate machine learning, graph & NLP recommendation models
  • Productionize your ML models to serve real-time recommendations
  • Perform a hybrid on-premise and cloud deployment using Docker
  • Customize this workshop environment to your specific use cases

Speaker: Chris Fregly

Principal Data Solutions Engineer, Apache Spark Contributor, Netflix Open Source Committer & the Original Developer of the SMACK Stack

Chris Fregly is a Principal Data Solutions Engineer for the newly-formed IBM Spark Technology Center, an Apache Spark Contributor, a Netflix Open Source Committer, and the Original Developer of the SMACK Stack. Chris is also the founder of the global Advanced Spark and TensorFlow Meetup - and author of the upcoming book, Advanced Spark @ advancedspark.com. Previously, Chris was a Data Solutions Engineer @ Databricks and a Streaming Data Engineer @ Netflix. When Chris isn’t contributing to Spark and other Open Source projects, he’s creating book chapters, slides, and demos to share with his peers through meetups, webinars, workshops, and conferences throughout the world.

Find Chris Fregly at

Tracks