Track: Stream Processing at Large

Location: Broadway Ballroom South Center, 6th fl.

Duration: 1:40pm - 2:30pm

Day of week: Tuesday

Level: Intermediate - Advanced

Persona: Data Scientist, Developer

What You’ll Learn

Understand the need and role of defining types in an Enterprise Application.
Learn techniques and approaches to manage these schemas.
Hear war stories and lessons on what can happen if you don’t practice clean practices around defining your types.

Abstract

In a world of microservices that communicate via unbounded streams of events, schemas are the contracts between the services. Having an agreed contract allows the teams developing those services to move fast, by reducing the risk involved in making changes. Yet delivering events with schema change in mind isn’t the common practice yet.

In this presentation, we’ll discuss patterns of schema design, schema storage and schema evolution that help development teams build better contracts through better collaboration - and deliver resilient applications faster. We’ll look at how schemas were used in the past, how their meaning has changed over the years and why they gained particular importance with the rise of the stream processing.

Interview

Question:

QCon: What’s the motivation for your talk?

Answer:

Gwen: I’ve been concerned about the way companies manage their metadata since… 2012 probably. When I first moved from managing relational databases to managing Hadoop clusters. DBAs take metadata, especially schemas for granted. And then suddenly Hadoop was this wild west, people just dump data and no one knows how to use it. You create all those crazy dependencies between teams because whoever writes the data makes decisions that affect everyone and can break downstream apps at any time. This is even more difficult with stream processing because of the real-time and microservices nature of the applications.

I spent the last 5 years working with customers on solving this problem with different tools and environments. I feel like I have quite a lot to share.

Question:

QCon:How you you describe the persona of the target audience of this talk?

Answer:

Gwen: The relevant role is usually “enterprise architect”, because they have overall responsibility for how different applications communicate and play together. Although I hope that many responsible engineers care as well. My target audience is usually from medium to huge companies - you need to be of a certain size before questions of compatibility become important.

Question:

QCon: How are you going to address these things?

Answer:

Gwen: I'm going to spend part of my time just telling horror stories of what happens if you don't manage your schemas (I have four years worth of horror stories to share). Then I'm going to talk about how it's a general problem. It's not about if you use Avro or if you use JSON or something thing else. It doesn't even matter if you do stream processing at all, it's a very generic problem on how components, services and teams communicate.

Then I'm going to go into some solutions, including the Confluent Schema Registry. It's open to note though; there are lots of other solutions that you can use too.

I want to end the talk with few examples of the potential in implementing this kind of centralized streams and schema of management. In addition to the immediate compatibility benefits - a centralized metadata store can be used for data discovery and for governance. I hope to share some examples of what forward-looking enterprise architects in some organizations are currently exploring.

Question:

QCon: QCon targets advanced architects and sr development leads, what do you feel will be the actionable that type of persona will walk away from your talk with?

Answer:

If you use events to communicate between applications (and this includes all stream processing apps) - you absolutely need to figure out a way to detect and prevent schema compatibility issues early in the development process. You also need reasonable ways to allow schemas to change without breaking things. My talk is full of suggestions on how to do both.

Question:

What do you feel is the most important thing/practice/tech/technique for a developer/leader in your space to be focused on today?

Answer:

Gwen: The transition from both request-response processing and batch processing to stream processing.

Every business has many applications that are either request-response or batch due to historical reasons - but the real business process they model is a continuous stream of events. Using new technologies to model the business process more accurately in the applications will help make the entire process more efficient and more timely.

I am typically wary of cutting-edge technologies and prefer to use proven systems (like Kafka!), but one of the technologies I am currently most curious about is Lift’s Envoy. I hope to learn more about it at QCon NYC.

Speaker: Gwen Shapira

System Architect @Confluent, PMC Member @Kafka, & Committer Apache Sqoop

Gwen is a product manager at Confluent. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. Gwen is the author of “Kafka - The Definitive Guide” and "Hadoop Application Architectures", and a frequent presenter at industry conferences. Gwen is a PMC member on the Apache Kafka project and committer on Apache Sqoop. When Gwen isn't building data pipelines or thinking up new features, you can find her pedaling on her bike exploring the roads and trails of California, and beyond.

Find Gwen Shapira at

Speaker page

@gwenshap

Similar Talks

The Effective Remote Developer

Director of Engineering

David Copeland

Evaluating Machine Learning Models: A Case Study

Data Scientist @Opendoor

Nelson Ray

When Microservices Meet Event Sourcing

Software Developer @ThoughtWorks

Vinicius Gomes

I Have A NoSQL toaster

Developer Advocate @Couchbase

Matthew Groves

Engineer Innovation Through Rapid Prototyping

Principal Software Engineer @ Vistaprint

Ramon Harrington

Nonconformist Resilience: DB-Backed Job Queues

VP Architecture @Betterment

John Mileham

Managing Millions of Data Services @Heroku

Senior Infrastructure Engineer @Heroku

Gabriel Enslein

Building Microservices @Squarespace

Director of Engineering @ Squarespace

Franklin Angulo

Refactor Frontend APIs & Accounting for Tech Debt

Software Engineer @Indiegogo

Julia Nguyen

Tracks

Monday, 26 June

Microservices: Patterns & Practices

Practical experiences and lessons with Microservices.
Java - Propelling the Ecosystem Forward

Lessons from Java 8, prepping for Java 9, and looking ahead at Java 10. Innovators in Java.
High Velocity Dev Teams

Working Smarter as a team. Improving value delivery of engineers. Lean and Agile principles.
Modern Browser-Based Apps

Reactive, cross platform, progressive - webapp tech today.
Innovations in Fintech

Technology, tools and techniques supporting modern financial services.

Tuesday, 27 June

Architectures You've Always Wondered About

Case studies from the most relevant names in software.
Developer Experience: Level up Your Engineering Effectiveness

Trends, tools and projects that we're using to maximally empower your developers.
Chaos & Resilience

Failures, edge cases and how we're embracing them.
Stream Processing at Large

Rapidly moving data at scale.
Building Security Infrastructure

How our industry is being attacked and what you can do about it.

Wednesday, 28 June

Next Gen APIs: Designs, Protocols, and Evolution

Practical deep-dives into public and internal API design, tooling and techniques for evolving them, and binary and graph-based protocols.
Immutable Infrastructures: Orchestration, Serverless, and More

What's next in infrastructure. How cloud function like lambda are making their way into production.
Machine Learning 2.0

Machine Learning 2.0, Deep Learning & Deep Learning Datasets.
Modern CS in the Real World

Applied, practical, & real-world dive into industry adoption of modern CS.
Optimizing Yourself

Maximizing your impact as an engineer, as a leader, and as a person.
Ask Me Anything (AMA)

This Year's Schedule

Track: Stream Processing at Large

Location: Broadway Ballroom South Center, 6th fl.

Duration: 1:40pm - 2:30pm

Day of week: Tuesday

Level: Intermediate - Advanced

Persona: Data Scientist, Developer

What You’ll Learn

Abstract

Interview

Find Gwen Shapira at

Similar Talks

Tracks

Monday, 26 June

Tuesday, 27 June

Wednesday, 28 June

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Presentation: Streaming Microservices: Contracts & Compatibility

Track: Stream Processing at Large

Location: Broadway Ballroom South Center, 6th fl.

Duration: 1:40pm - 2:30pm

Day of week: Tuesday

Level: Intermediate - Advanced

Persona: Data Scientist, Developer

More talks on:

What You’ll Learn

Abstract

Interview

Find Gwen Shapira at

Similar Talks

Tracks

Monday, 26 June

Tuesday, 27 June

Wednesday, 28 June

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World