In the ever changing landscape of big data, focus is slowly moving away from batch and towards realtime analytics. Data Science workflows are evolving to adapt to this changing landscape. Realtime analytics are limited only by the underlying architecture that enables low latency ingestion, processing and serving of high throughput data. Lambda architecture unified batch and realtime to provide low latency computations with eventual correctness. However the challenges of maintaining two different codebases(among other things) made operations hard.
The founder of Kafka talks about a streaming first kappa architecture in this 2014 article. A streaming first, single path solution that can handle realtime processing as well as reprocessing and backfills. Kappa architecture makes sense, however it is 2023 and it has a long way to go for full scale adoption. A paradigm shift is required in how we design data infrastructure. Now, instead of worrying about two codebases, we need to worry about bootstrapping from a stream, backfills from history, idempotent sinks to handle reprocessing etc.
In this presentation, I will talk about strategies to evolve your Data Infrastructure to enable Kappa architecture in your organization. An iterative roadmap to move away from Lambda, while ensuring minimum disruption to end-users. I will be using real world examples from tech companies as case studies. By the end of this presentation you will walk away with a concrete roadmap for designing a data platform built on Kappa architecture.
Speaker
Sherin Thomas
Staff Software Engineer @Chime
Sherin is a Software Engineer with over 12 years of experience at companies like Google, Twitter, Lyft, Netflix and Chime. She works in the field of Big Data, Streaming, ML/AI and Distributed Systems. Currently, she's building a shiny new data platform at Chime. Sherin has presented on the topic of ML and Streaming at various reputable conferences including a keynote address and has judged various awards such as SXSW Innovation awards and CES.
Recently she advised NASA's SpaceML program and helped build a platform for processing petabytes of satellite imagery for detecting weather patterns and labelling raw data for climate science related AI research. She also writes a blog where she shares her thoughts on technology, work and career.
When she's not technical stuff she enjoys painting, reading, perusing the art and fashion section of New York Times and spending time with her husband and toddler.