You are viewing content from a past/completed conference.
  
    
  
  
        
    
  
    
      
  
Architecting a Production Development Environment for Reliability
    
  
    
      
	
	
	
	
	
		
		
	
	
		
			
				
					
					                    Abstract
					
						At Meta, developers use a combination of development servers, including virtual machines and physical hosts, as well as on-demand containers to perform their daily software engineering work.
In this talk, we will present these environments and discuss a few of their architectural underpinnings put in place specifically to ensure their availability and reliability in the presence of maintenance workflows and disasters.
In discussing these environments, their architecture, and their reliability characteristics, we will be focusing on addressing questions such as:
- Where does the data used by developers live and why does that make the design reliable in the face of disasters?
- What are the backup and migration strategies in place and why does it allow us to continue working in the face of outages?
- What are the types of disasters we prepare for and how do we communicate with our users in the face of these outages?
- How do we conduct OS and software updates/upgrades without causing disruptions to the developer community?
 
					
						
					
					
					Speaker
     
    
    
            Henrique Andrade
      Production Engineer @Meta
          
    Henrique is a Software Engineer, currently disguised as a Production Engineer, who leads the Developer Environments production engineering team, focusing on the reliability and stability of the development platform used daily by most of the software engineering workforce at Meta. He believes that he can make anything better and more reliable, which led him to fix his dryer and washer multiple times (something he did with a somewhat limited degree of success). He holds a PhD in Computer Science from the University of Maryland, College Park and is one of the co-authors of Fundamentals of Stream Processing: Application Design, Systems and Analytics published by Cambridge University Press.
      Find
      Henrique Andrade
      at:
    
    
       
 
 
				
			 
		 
	
			
			
				From the same track
				
					
    
        Session
        Architecture
        Reliable Architectures Through Observability
        Wednesday Jun 14 / 02:55PM EDT
        
            
            We want our systems to be reliable, but testing alone isn't enough. In a complex, multi-service system, it's impossible to test your way to correctness. That's why we need observability. Observability is the ability to see what our code is doing, in production and in development.
      
        
        	
		 
		
			Kent Quirk
			Staff Engineer @Honeycomb.io
		 
	 
 
        Reliable Architectures Through Observability
     
 
    
        Session
        Kafka
        How to Build a Reliable Kafka Data Processing Pipeline, Focusing on Contention, Uptime and Latency
        Wednesday Jun 14 / 10:35AM EDT
        
            
            Shifting workloads from synchronous to asynchronous can simplify the operational cost of high-throughput HTTP services. But understanding the evolution of performance metrics in the world of complex, high-concurrency, asynchronous distributed systems can be quite challenging.
      
        
        	
		 
		
			Lily Mara
			Engineering Manager @OneSignal
		 
	 
 
        How to Build a Reliable Kafka Data Processing Pipeline, Focusing on Contention, Uptime and Latency
     
 
    
        Session
        Architecture
        Building an Architecture to Predict Customer Behavior in a Revenue-Critical System
        Wednesday Jun 14 / 01:40PM EDT
        
            
            At Neon digital bank in Brazil, we strive to make revenue-impacting predictions based on customer behavior. Building a low latency and high availability distributed system that meets this requirement becomes especially challenging.
      
        
        	
		 
		
			Yves Junqueira
			Distinguished Software Engineer @Neon
		 
	 
 
        Building an Architecture to Predict Customer Behavior in a Revenue-Critical System
     
 
    
        Session
        Cloud Architecture
        Survival Strategies for the Noisy Neighbor Apocalypse
        Wednesday Jun 14 / 05:25PM EDT
        
            
            Noisy neighbor issues are a common challenge for multi-tenant platforms, leading to resource contention, performance degradation, and costly downtime for other tenants sharing the same resources.
      
        
        	
		 
		
			Meenakshi Jindal
			Staff Software Engineer @Netflix
		 
	 
 
        Survival Strategies for the Noisy Neighbor Apocalypse
     
 
    
        Session
        
        Unconference: Designing Modern Reliable Architectures
        Wednesday Jun 14 / 11:50AM EDT
        
            
            What is an unconference?
An unconference is a participant-driven meeting. Attendees come together, bringing their challenges and relying on the experience and know-how of their peers for solutions.
      
        
        
        Unconference: Designing Modern Reliable Architectures