At Meta, developers use a combination of development servers, including virtual machines and physical hosts, as well as on-demand containers to perform their daily software engineering work.
In this talk, we will present these environments and discuss a few of their architectural underpinnings put in place specifically to ensure their availability and reliability in the presence of maintenance workflows and disasters.
In discussing these environments, their architecture, and their reliability characteristics, we will be focusing on addressing questions such as:
- Where does the data used by developers live and why does that make the design reliable in the face of disasters?
- What are the backup and migration strategies in place and why does it allow us to continue working in the face of outages?
- What are the types of disasters we prepare for and how do we communicate with our users in the face of these outages?
- How do we conduct OS and software updates/upgrades without causing disruptions to the developer community?
Production Engineer @Meta
Henrique is a Software Engineer, currently disguised as a Production Engineer, who leads the Developer Environments production engineering team, focusing on the reliability and stability of the development platform used daily by most of the software engineering workforce at Meta. He believes that he can make anything better and more reliable, which led him to fix his dryer and washer multiple times (something he did with a somewhat limited degree of success). He holds a PhD in Computer Science from the University of Maryland, College Park and is one of the co-authors of Fundamentals of Stream Processing: Application Design, Systems and Analytics published by Cambridge University Press.