Workshop: Apache Spark, Kafka-based Recommendation Pipeline

Location:

Level: 
Intermediate
9:00am - 4:00pm

Date:

Fri, 17 Jun

Prerequisites

Basic familiarity with Unix/Linux commands; Experience in SQL, Java, Scala, Python, or R; Basic familiarity with linear algebra concepts (dot product); - Laptop with ssh client and a modern browser

The goal of this workshop is to build an end-to-end, streaming data analytics and recommendations pipeline on your local machine using Docker and the latest streaming analytics tools.

First, we create a data pipeline to interactively analyze, approximate, and visualize streaming data using modern tools such as Apache Spark, Kafka, Zeppelin, iPython, and ElasticSearch.

Next, we extend our pipeline to use streaming data to generate personalized recommendation models using popular machine learning, graph, and natural language processing techniques such as collaborative filtering, clustering, and topic modeling.

Lastly, we productionize our pipeline and serve live recommendations to our users!

Attendees will learn how to:

  • Create a complete, end-to-end streaming data analytics pipeline
  • Interactively analyze, approximate, and visualize streaming data
  • Generate machine learning, graph & NLP recommendation models
  • Productionize your ML models to serve real-time recommendations
  • Perform a hybrid on-premise and cloud deployment using Docker
  • Customize this workshop environment to your specific use cases

Speaker: Chris Fregly

Principal Data Solutions Engineer, Apache Spark Contributor, Netflix Open Source Committer & the Original Developer of the SMACK Stack

Chris Fregly is a Principal Data Solutions Engineer for the newly-formed IBM Spark Technology Center, an Apache Spark Contributor, a Netflix Open Source Committer, and the Original Developer of the SMACK Stack. Chris is also the founder of the global Advanced Spark and TensorFlow Meetup - and author of the upcoming book, Advanced Spark @ advancedspark.com. Previously, Chris was a Data Solutions Engineer @ Databricks and a Streaming Data Engineer @ Netflix. When Chris isn’t contributing to Spark and other Open Source projects, he’s creating book chapters, slides, and demos to share with his peers through meetups, webinars, workshops, and conferences throughout the world.

Find Chris Fregly at

Tracks

  • Architectures You've Always Wondered about

    Case studies from the most relevant names in software

  • Developer Experience: Toolchain, Continuous Delivery, & More

    Trends, tools and projects that we're using to maximally empower your developers.

  • DevOps & Site Reliability

    Failures, edge cases and how we're embracing them.

  • High Velocity Dev Teams

    Working Smarter as a team. Improving value delivery of engineers. Lean and Agile principles.

  • Immutable Infrastructures: Orchestration, Serverless, and More

    What's next in infrastructure. How cloud function like lambda are making their way into production.

  • Innovations in Fintech

    Technology, tools and techniques supporting modern financial services

  • Machine Learning 2.0

    Machine Learning 2.0, Deep Learning & Deep Learning Datasets

  • Microservices: Patterns & Practices

    Practical experiences and lessons with Microservices

  • Modern Clientside Apps

    Reactive, cross platform, progressive - webapp tech today

  • Modern CS in the Real World

    Applied, practical, & real-world dive into industry adoption of modern CS

  • Next Gen APIs

    Tooling, techniques, & practices building APIs today

  • Optimizing Yourself

    Maximizing your impact as an engineer, as a leader, and as a person

  • Security War Stories

    How our industry is being attacked and what you can do about it.

  • Stream Processing in Practice

    Rapidly moving data at scale.

  • Today's Java

    Lessons from 8, prepping for 9, and peeking ahead at 10. Innovators in Java.