The design of reliable architectures have evolved a lot in the past few years and assumptions that were true for consumer services are no longer effective to provide quality performance for enterprise customers. In the world of globally distributed systems, the most reliable services are those that offer improved observability, allowing for performance troubleshooting, quick outage investigation and fast mitigation. Revenue critical applications must find the correct balance between processing resource cost, performance latency and service availability, often exploring multi-cloud solutions while migrating from a traditional architecture on-prem into a new distributed microservice model in the Cloud.
On this track, you will see a diverse collection of reliability strategies covering finance applications, gaming platforms and other real world segments of the tech industry. Join us to hear reliability practitioners from BigTechs and startups on how they have unlocked improved architectures for their products to meet their customer availability, consistency and performance needs.
From this track
How to Build a Reliable Kafka Data Processing Pipeline, Focusing on Contention, Uptime and Latency
Wednesday Jun 14 / 10:35AM EDT
Shifting workloads from synchronous to asynchronous can simplify the operational cost of high-throughput HTTP services. But understanding the evolution of performance metrics in the world of complex, high-concurrency, asynchronous distributed systems can be quite challenging.
Lily Mara
Engineering Manager @OneSignal
Unconference: Designing Modern Reliable Architectures
Wednesday Jun 14 / 11:50AM EDT
What is an unconference? An unconference is a participant-driven meeting. Attendees come together, bringing their challenges and relying on the experience and know-how of their peers for solutions.
Building an Architecture to Predict Customer Behavior in a Revenue-Critical System
Wednesday Jun 14 / 01:40PM EDT
At Neon digital bank in Brazil, we strive to make revenue-impacting predictions based on customer behavior. Building a low latency and high availability distributed system that meets this requirement becomes especially challenging.
Yves Junqueira
Distinguished Software Engineer @Neon
Reliable Architectures Through Observability
Wednesday Jun 14 / 02:55PM EDT
We want our systems to be reliable, but testing alone isn't enough. In a complex, multi-service system, it's impossible to test your way to correctness. That's why we need observability. Observability is the ability to see what our code is doing, in production and in development.
Kent Quirk
Staff Engineer @Honeycomb.io
Architecting a Production Development Environment for Reliability
Wednesday Jun 14 / 04:10PM EDT
At Meta, developers use a combination of development servers, including virtual machines and physical hosts, as well as on-demand containers to perform their daily software engineering work.
Henrique Andrade
Production Engineer @Meta
Survival Strategies for the Noisy Neighbor Apocalypse
Wednesday Jun 14 / 05:25PM EDT
Noisy neighbor issues are a common challenge for multi-tenant platforms, leading to resource contention, performance degradation, and costly downtime for other tenants sharing the same resources.
Meenakshi Jindal
Staff Software Engineer @Netflix