Presentation: "Storm: Distributed and Fault-Tolerant Realtime Computation"

Time: Wednesday 15:30 - 16:30

Location: Salon D

Abstract:
Storm makes it easy to write and scale complex realtime computations on a cluster of computers, doing for realtime processing what Hadoop did for batch processing. Storm guarantees that every message will be processed. And it’s fast — you can process millions of messages per second with a small cluster. Best of all, you can write Storm topologies using any programming language. Storm was open-sourced by Twitter in September of 2011 and has since been adopted by numerous companies around the world.

Storm provides a small set of simple, easy to understand primitives. These primitives can be used to solve a stunning number of realtime computation problems, from stream processing to continuous computation to distributed RPC. In this talk you’ll learn:

- The concepts of Storm: streams, spouts, bolts, and topologies
- Developing and testing topologies using Storm’s local mode
- Deploying topologies on Storm clusters
- How Storm achieves fault-tolerance and guarantees data processing
- Computing intense functions on the fly in parallel using Distributed RPC
- Making realtime computations idempotent using transactional topologies
- Examples of production usage of Storm

Nathan Marz, Lead Engineer Backtype @Twitter

 Nathan  Marz

Nathan Marz is the lead engineer on Twitter's Publisher Analytics team. Previously Nathan was the lead engineer of BackType which was acquired by Twitter in July of 2011. He is a major believer in the power of open source and has authored some significant open source projects, including Cascalog, ElephantDB, and Storm. He writes a blog at http://nathanmarz.com.