At Meta, developers use a combination of development servers, including virtual machines and physical hosts, as well as on-demand containers to perform their daily software engineering work.

In this talk, we will present these environments and discuss a few of their architectural underpinnings put in place specifically to ensure their availability and reliability in the presence of maintenance workflows and disasters.

In discussing these environments, their architecture, and their reliability characteristics, we will be focusing on addressing questions such as:

Where does the data used by developers live and why does that make the design reliable in the face of disasters?
What are the backup and migration strategies in place and why does it allow us to continue working in the face of outages?
What are the types of disasters we prepare for and how do we communicate with our users in the face of these outages?
How do we conduct OS and software updates/upgrades without causing disruptions to the developer community?

From the same track

Session Architecture

Reliable Architectures Through Observability

Wednesday Jun 14 / 02:55PM EDT

We want our systems to be reliable, but testing alone isn't enough. In a complex, multi-service system, it's impossible to test your way to correctness. That's why we need observability. Observability is the ability to see what our code is doing, in production and in development.

Kent Quirk

Staff Engineer @Honeycomb.io

Session Kafka

How to Build a Reliable Kafka Data Processing Pipeline, Focusing on Contention, Uptime and Latency

Wednesday Jun 14 / 10:35AM EDT

Shifting workloads from synchronous to asynchronous can simplify the operational cost of high-throughput HTTP services. But understanding the evolution of performance metrics in the world of complex, high-concurrency, asynchronous distributed systems can be quite challenging.

Lily Mara

Engineering Manager @OneSignal

Session Architecture

Building an Architecture to Predict Customer Behavior in a Revenue-Critical System

Wednesday Jun 14 / 01:40PM EDT

At Neon digital bank in Brazil, we strive to make revenue-impacting predictions based on customer behavior. Building a low latency and high availability distributed system that meets this requirement becomes especially challenging.

Yves Junqueira

Distinguished Software Engineer @Neon

Session Cloud Architecture

Survival Strategies for the Noisy Neighbor Apocalypse

Wednesday Jun 14 / 05:25PM EDT

Noisy neighbor issues are a common challenge for multi-tenant platforms, leading to resource contention, performance degradation, and costly downtime for other tenants sharing the same resources.

Meenakshi Jindal

Staff Software Engineer @Netflix

Session

Unconference: Designing Modern Reliable Architectures

Wednesday Jun 14 / 11:50AM EDT

What is an unconference? An unconference is a participant-driven meeting. Attendees come together, bringing their challenges and relying on the experience and know-how of their peers for solutions.

Architecting a Production Development Environment for Reliability

Abstract

Speaker

Henrique Andrade

Find Henrique Andrade at:

Speaker

Henrique Andrade

Date

Location

Track

Topics

Share

From the same track

Reliable Architectures Through Observability

How to Build a Reliable Kafka Data Processing Pipeline, Focusing on Contention, Uptime and Latency

Building an Architecture to Predict Customer Behavior in a Revenue-Critical System

Survival Strategies for the Noisy Neighbor Apocalypse

Unconference: Designing Modern Reliable Architectures

Follow QCon

Contact

Menu

Conferences around the World