Control Planes: Designing Infrastructure for Rapid Iteration | Software Development Conference QCon New York

What You’ll Learn

Learn some of the tradeoffs of different orchestration systems.
Hear Clever’s story of deploying to ECS and some of the lessons learned.
Understand that you can drive decision making based on what engineers want to use.

Abstract

As a small engineering team of 40 at Clever, we aim to focus all of our efforts on building feature depth and improve resiliency. As a company focussed on K-12 education, we want to maximize time working with our customers and not on building orchestration infrastructure. However, we also know that well designed infrastructure and developer tooling allows us to move faster safely.

Our infrastructure team mirrors our product teams’ extreme extreme focus on user experience, and we constantly evaluate our options. Over time we have moved our container orchestration system from a internally built prototype in 2014 to Mesos Marathon and finally Amazon Elastic Container Service. We build infrastructure when required, but move to an off-the-shelf solution when it satisfies our requirements to minimize ongoing maintenance. This has allowed our small team to build reliable products that support education in over 60% of K-12 schools in the US.

In this session I want to share our learnings on how to build developer control planes to allow your infrastructure team to make changes without disrupting engineers. Specifically I will talk about

Lessons learnt about building control planes using snapshots of our own service deployment orchestration tooling over the last four years. A lot of our building blocks are available as public repositories on Github
Designing infrastructure tooling for rapid evolution and change using examples from the rollout of our batch processing system over the last year.
Evaluation and decision making frameworks for choosing between using cloud-managed, open source and build-your-own options through our own move from self-hosting containers to using a containers-as-a-service platform.

Question:

What is your field of working for today?

Answer:

I've been at Clever for since the start (five and a half years while the company itself started around six years ago). I joined as a software engineer, started focussed on infrastructure and security and also started the infrastructure team. Due to my tenure, I do end up dealing with things such as old database instances and the alerts and metrics for them.

However, as a technical product manager for infrastructures and security most of my time is spent in planning for the coming year. What that means is, for now, our biggest focus is on resiliency. We are expecting to grow some of our core usage like 4 to 5x within a month.

Clever is an education company, so we get all of our users in one month between August and September. We drop down to very low usage during the summer as everybody's on vacation and then when people come back, all of the work that we've done over the last year sees use. And even that happens in chunks when the East Coast wakes up and kids go to school for example.

Question:

What do your systems do? What is it actually managing?

Answer:

We are an enterprise company, but for the education space. So, we connect school districts, primarily public school districts, to education apps that they use in the classroom. We provide everything from account management, security (and thinking of data), single sign-on and a portal.

If you use Chromebooks which is most schools in the country are using right now, you log in to the Chromebook using Clever. Kids can log into the Chromebook without entering a password by using Clever Badges (which use QR codes).

Basically, when a class starts, we get thousands of authorization requests of different kinds from different schools.

Question:

So, you use Amazon Elastic Container Service to spin up different instances for the applications that they need to serve?

Answer:

Yes. Using microservices was not an explicit decision that we ever made. We just found ourselves there. So, you know, we have been on about 400 different applications in our cluster, and we have 40 engineers.

Question:

So, is it all Go (Golang)? Is that what you said before?

Answer:

Yes, but we started on Node.js and MongoDB. MongoDB is still our primary data store, but we have Polyglot database store right now and most of our backend services are in Go.

Question:

As we were talking before, you mentioned that you've tried all these orchestration tools, you've done your own scripts, you've moved to Marathon, and, ultimately, made it to ECS. How are you going to tell the story?

Answer:

We are an education-focused company that works with public schools. Most employees at Clever, joined the company to make impact in the classroom. Engineers at Clever really care about product delivery. They do care about solving complex problems, but they mostly care about the customer.

This is the primary driving factor for our technical infrastructure. How can we as an infrastructure team drive ourselves out of business every six months? We are a small team and only want to be solving problems that directly affect our customers. When we saw Docker, we realized that would allow engineers to focus very clearly on their application and completely isolate themselves from ‘infrastructure needs’.

Early on use of Docker suddenly took us over, just because everybody wanted to use it, and we were waiting for it. We rewrote scripts so the Docker containers would go on standard EC2 instances. It was just running one Docker container on an instance or two Docker containers on an instance, with no orchestration. But it pretended to be an orchestration system.

Coming to your question, the story that I think that is exciting is how, like most things, we drove our decision making based on what engineers wanted to use, building the user interface and the tooling and then using smoke and mirrors in the background to make that happen.

That allowed us to look at what our needs were at a specific time. For example, the first issue that we faced was we were developing a lot of new asynchronous jobs and we had to deploy them quickly. EC2 instances were becoming slow and becoming too expensive. So we had to make a change. Mesos was a system that we used then because we couldn't figure out a good solution of getting the load balancing to work or services to work right and we had to do asynchronous work.

We kind of moved to Mesos and then we had a couple of senior engineers look into how you get load balancing working. While we were doing that, Kubernetes become big, so we started looking into it. We used Kubernetes for services for a little bit, and then ECS came out which allowed us to kind of use our existing infrastructure and move much faster than we were with speed of Kubernetes at that time.

Question:

Who is your main audience for this talk?

Answer:

The main audience is architects, engineers, and infrastructure engineers. Anybody who cares about velocity or resiliency of an engineering team. I think that is the real focus is around organizational and team level engineering productivity.

I’d like engineers, through our stories to be able to create space for technical experimentation while building complex systems. And to evolve rapidly without thrashing the engineering team’s velocity. We focus a lot of allowing us time to think about decisions carefully while still also delivering tools to engineers.

Even with infrastructure, you can start from the user interface. You don't have to solve all the technical problems first. You can look further into the future to your ideal architecture, knowing that others will fix many of those problems because many others are facing the same issues. Some features are more important for your team than everybody else. Those are the solutions you need to be focussed on.

Speaker: Mohit Gupta

Product Manager, Infrastructure @clever

Mohit works at Clever to ensure that engineers at Clever have the tools and services that allow them to develop and release products to our users reliably, continuously, with flair and fun. Mohit has a background in technology policy, ethnography and science studies and has worked on building Clever for over five years. In the past he's been part of the Electronic Frontier Foundation, Microsoft Research and UC Berkeley's School of Information.

Find Mohit Gupta at

Speaker page

mohitgupta

Technical Product Manager, Infrastructure

Similar Talks

Autonomous Microservices

Developer Advocate @Couchbase

Matthew Groves

Software Updates in an Orchestrated World

Director of Product @JFrog

Craig Peters

Serverless + Containers = Modern Cloud Applications

Product Manager @PulumiCorp

Donna Malayeri

Introduction to gVisor: Sandboxed Linux Container Runtime

Senior Developer Advocate @GCPcloud

Emma Haruka Iwao

Next Gen Networking Infrastructure With Rust

Senior Software Engineer

Carl Lerche

Coinbase Commerce: A User-Controlled Payment Processor

Software Engineer @Coinbase

Amy Yin

Tracks

Microservices: Patterns & Practices

Evolving, observing, persisting, and building modern microservices
Developer Experience: Level up Your Engineering Effectiveness

Improving the end to end developer experience - design, dev, test, deploy, operate/understand. Tools, techniques, and trends.
Modern Java Reloaded

Modern, Modular, fast, and effective Java. Pushing the boundaries of JDK 9 and beyond.
Modern User Interfaces: Screens and Beyond

Zero UI, voice, mobile: Interfaces pushing the boundary of what we consider to be the interface
Practical Machine Learning

Applied machine learning lessons for SWEs, including tech around TensorFlow, TPUs, Keras, Caffe, & more

Ethics in Computing

Inclusive technology, Ethics and politics of technology. Considering bias. Societal relationship with tech. Also the privacy problems we have today (e.g., GDPR, right to be forgotten)
Architectures You've Always Wondered About

Next-gen architectures from the most admired companies in software, such as Netflix, Google, Facebook, Twitter, Goldman Sachs
Modern CS in the Real World

Thoughts pushing software forward, including consensus, CRDT's, formal methods, & probalistic programming
Container and Orchestration Platforms in Action

Runtime containers, libraries, and services that power microservices
Finding the Serverless Sweetspot

Stories about the pains and gains from migrating to Serverless.

Chaos, Complexity, and Resilience

Lessons building resilient systems and the war stories that drove their adoption
Real World Security

Practical lessons building, maintaining, and deploying secure systems
Blockchain Enabled

Exploring Smart contracts, oracles, sidechains, and what can/cannot be done with blockchain today.
21st Century Languages

Lessons learned from languages like Rust, Go-lang, Swift, Kotlin, and more.
Empowered Teams

Safely running inclusive teams that are autonomous and self-correcting

Schedule

Track: Container and Orchestration Platforms in Action

Location: Broadway Ballroom South Center, 6th fl.

Duration: 11:50am - 12:40pm

Day of week: Wednesday

Level: Intermediate

Persona: Architect, Developer

What You’ll Learn

Abstract

Find Mohit Gupta at

Similar Talks

Tracks

Microservices: Patterns & Practices

Developer Experience: Level up Your Engineering Effectiveness

Modern Java Reloaded

Modern User Interfaces: Screens and Beyond

Practical Machine Learning

Ethics in Computing

Architectures You've Always Wondered About

Modern CS in the Real World

Container and Orchestration Platforms in Action

Finding the Serverless Sweetspot

Chaos, Complexity, and Resilience

Real World Security

Blockchain Enabled

21st Century Languages

Empowered Teams

Presentation: Control Planes: Designing Infrastructure for Rapid Iteration

Track: Container and Orchestration Platforms in Action

Location: Broadway Ballroom South Center, 6th fl.

Duration: 11:50am - 12:40pm

Day of week: Wednesday

Level: Intermediate

Persona: Architect, Developer

More talks on:

Share this on:

What You’ll Learn

Abstract

Find Mohit Gupta at

Similar Talks

Tracks