Presentation: The Architecture That Helps Stripe Move Faster

Location:

Duration

Duration: 
5:25pm - 6:15pm

Day of week:

Level:

Persona:

Key Takeaways

  • Learn mechanisms to lower the impact of API changes
  • Understand the importance of creating logical and physical boundaries for your system.
  • Hear the challenges Stripe faced when migrating their infrastructure.

Abstract

Stripe aims to be the easiest way to do business online. To that end, moving quickly is one of Stripe’s core values. We take a lot of pride in work that lets either us or our users move faster. In this talk, I'll show how we've designed our systems to speed up the development process. First, I'll discuss how the software infrastructure in our API enables the next generation of tech companies to build faster and with less pain. Next, I'll examine how Stripe solves PCI and compliance concerns in a way that allows our engineering teams to develop new features more quickly. And finally, I'll walk through our recent datacenter migration, which we were able to complete quickly and without service interruption by careful planning.

Interview

Question: 
Learn mechanisms to lower the impact of API changes Understand the importance of creating logical and physical boundaries for your system. Hear the challenges Stripe faced when migrating their infrastructure.
Answer: 
I am an engineer our infrastructure team and I have historically work on most of our infrastructure systems. But right now, I am primarily focusing on our database systems and pretty much everything about them: from durability, monitoring, automation, tooling, and introspection.
Question: 
What is the basic infrastructure like at Stripe?
Answer: 
Stripe is primarily a Ruby shop with sort of an increased amount of Go code, kind of slowing going up. We use some Scala for our batching ETL and Hadoop type infrastructure. But most of our production piece of code is writing in Ruby. We tend to use sort of a mix of MySQL, Postgres and MongoDB for our database storage. My focus is primarily on the MongoDB side for the stuff that I work on today.
Question: 
You said it tends to be mostly Ruby but more and more Go. Why? Is there a reason for that?
Answer: 
Performance has been a big piece for us. Ruby is sort of not always inherently slow but it is very easy to write slow Ruby code. So performance in some of the applications that we moved over has been a big deal. Another piece has been managing complexity. Ruby as a language can be very bad about allowing different components of the software to interact with each other in sort of unfortunate ways.
I think the most obvious example is things like Monkey patching. Ruby loves monkey patching and just adding functionality to sort of other people’s code in some sense and so sometimes it can be hard to track in Ruby. That is less and less feasible for us over time as the complexity of our code grows. The big advantage that we see with Go from that perspective, is that it’s very, very optimized for not having global complexity. Like I think that’s one of the things that is interesting about Go: it deliberately chooses localized complexity. Things like very manual error handling or lack of things like generics which make the localized code more complicated there is no spooky-action-at-a-distance kind of issues.
Question: 
What’s the motivation for your talk?
Answer: 
This is a story telling talk in some ways. There are three different stories at Stripe that I want to tell and they deal with sort of different relationships between Stripe both internally across engineering groups and how we relate to our users.
The first story is about how we deal with changing our API and just generally how we evolve the Stripe API. One of the things that we consider to be important about the Stripe API is that once you have integrated with it, you don’t have to change your code. It should work sort of in perpetuity because most people don’t care about it. We care about evolving our API. But if you have already written code and it works, you don’t care about changing that and so we don’t want to require you to change your code just so that we can do things that we think are interesting and different and innovative. So I am going to talk about some of the technical strategies that we use to actually implement that and how we support that on our backend and make it possible for us to change the API without affecting system users.
The second story has to deal with our PCI infrastructure. Our PCI compliance is hugely important to us, and we've done a good job of isolating the corresponding obligations to make sure it doesn't cause a ton of pain, but about 2 years ago we found that our PCI infrastructure was lagging behind the rest of our infrastructure in performance and scalability. We decided to rewrite our PCI-sensitive applications from Ruby to Go, and carefully found ways to incrementally roll out the new code. This let us detect problems early and roll out with more confidence, which ultimately led to a better architected, more maintainable, more reliable, and more scalable system.
The last story has to do with an infrastructure migration. This is something that we did last November. We migrated our entire infrastructure both from EC2 Classic, AWS's legacy networking environment, into VPC and also across regions from one region to another and so, we did this and an entire data center migration. We did it in the course of about 4 or 6 hours with no user visible downtime or any real latency impact. The important part of that story is that the way that we were able to do this effectively and without impacting our users was that we looked for points of super high leverage. We tried to find as many shared pieces of infrastructure as we could and solved problems at those layers to try and minimize the impact on the rest of the organization and sort of the amount of blockers that we had to do work through in order to make the migration move forward.
Question: 
QCon targets advanced architects and sr development leads, what do you feel will be the actionable that type of persona will walk away from your talk with?
Answer: 
I think the thing that I want people to get out of it is how to think about making big changes without them having to be all or nothing. I think all of these have a component where making teams incrementally is really key So I think it’s how to think about making large scale changes to your application or your infrastructure or whatever. It is that you are working on in a way that is sort of minimally disruptive. And because of that, most likely to succeed.

Speaker: Evan Broder

Principal Engineer @Stripe

Evan Broder has worked on systems and infrastructure at Stripe for four years, helping them stay online through several orders of magnitude of growth. Previously, he worked on virtualization management and the Linux desktop at MokaFive and helped build XVM at MIT, one of the earliest cloud computing environments.

Find Evan Broder at

Tracks

Monday, 13 June

Tuesday, 14 June

Wednesday, 15 June