Reliable Architectures Through Observability

We want our systems to be reliable, but testing alone isn't enough. In a complex, multi-service system, it's impossible to test your way to correctness. That's why we need observability. Observability is the ability to see what our code is doing, in production and in development. It allows us to build systems that are easier to write, debug, and maintain.


In this talk, I'll draw on my 40 years of software development experience to show you how to apply observability techniques to your own systems. We'll talk about:

  • Why observability should be built in as part of the high-level architecture
  • How to instrument your code for observability
  • How to use observability tools to debug and optimize your systems

Attendees will leave with an overview of observability tools and techniques, and specific recommendations for how to fit observability into their system designs and day-to-day development process.

What's the focus of your work these days?

I'm a staff engineer at Honeycomb.io, and the lead engineer working on Refinery, which is Honeycomb's tail-sampling proxy. Honeycomb makes observability tools used in monitoring for some of the largest cloud-based systems in the world, and Refinery helps those customers manage the volume of their observability data.

What's the motivation for your talk at QCon New York 2023?

Too many people think of observability as something you do once you're deployed at scale. I want to show how, if you start early, you can benefit from observability long before a customer ever touches your product.

How would you describe your main persona and target audience for this session?

Engineering leaders who are actively making decisions about what to build and how to build it. You don't have to be writing code to benefit from this talk, but if you do, you should learn a few things you can act on. 

Is there anything specific that you'd like people to walk away with after watching your session?

The knowledge that whatever the state of their project, from first prototype to longtime legacy, they can can find a way to leverage observability today and see some benefits immediately.


Speaker

Kent Quirk

Staff Engineer @Honeycomb.io

Kent Quirk has 40 years of software engineering experience all over the industry. He has worked on low-level embedded systems and device drivers, spent 15 years making games, and built backend systems for internet-scale SaaS. He’s a 3-time entrepreneur who has also worked for a few industry giants. He has held various engineering management positions like CTO and VP Engineering, but never stepped away from code and is currently a Staff Engineer on the Collection team at Honeycomb.io, focused on developer experience.

Read more
Find Kent Quirk at:

Date

Wednesday Jun 14 / 02:55PM EDT ( 50 minutes )

Location

Salon A-C

Topics

Architecture Observability How-to OpenTelemetry

Share

From the same track

Session Kafka

How to Build a Reliable Kafka Data Processing Pipeline, Focusing on Contention, Uptime and Latency

Wednesday Jun 14 / 10:35AM EDT

Shifting workloads from synchronous to asynchronous can simplify the operational cost of high-throughput HTTP services. But understanding the evolution of performance metrics in the world of complex, high-concurrency, asynchronous distributed systems can be quite challenging.

Speaker image - Lily Mara
Lily Mara

Engineering Manager @OneSignal

Session Architecture

Building an Architecture to Predict Customer Behavior in a Revenue-Critical System

Wednesday Jun 14 / 01:40PM EDT

At Neon digital bank in Brazil, we strive to make revenue-impacting predictions based on customer behavior. Building a low latency and high availability distributed system that meets this requirement becomes especially challenging.

Speaker image - Yves Junqueira
Yves Junqueira

Distinguished Software Engineer @Neon

Session Developer Environment

Architecting a Production Development Environment for Reliability

Wednesday Jun 14 / 04:10PM EDT

At Meta, developers use a combination of development servers, including virtual machines and physical hosts, as well as on-demand containers to perform their daily software engineering work.

Speaker image - Henrique Andrade
Henrique Andrade

Production Engineer @Meta

Session Cloud Architecture

Survival Strategies for the Noisy Neighbor Apocalypse

Wednesday Jun 14 / 05:25PM EDT

Noisy neighbor issues are a common challenge for multi-tenant platforms, leading to resource contention, performance degradation, and costly downtime for other tenants sharing the same resources.

Speaker image - Meenakshi Jindal
Meenakshi Jindal

Staff Software Engineer @Netflix

Session

Unconference: Designing Modern Reliable Architectures

Wednesday Jun 14 / 11:50AM EDT

What is an unconference? An unconference is a participant-driven meeting. Attendees come together, bringing their challenges and relying on the experience and know-how of their peers for solutions.