Track:

Location:

Salon A/B

Duration

Duration:

4:10pm - 5:00pm

Day of week:

Tuesday

Level:

Intermediate

Persona:

Architect
CTO/CIO/Leadership
Developer

Key Takeaways

Hear practical advice from a very early innovator with containers.
Understand how Gilt evolved the use of containers into their current architecture.
Learn lessons about the struggles/triumphs experienced by Gilt along the way to container adoption.

Abstract

It's been almost three years since Gilt embarked on adopting containers, first using LXC in our physical data-centre in Japan, and then adopting Docker on a mix of physical hardware and virtual machines in Amazon. We've used Docker for continuous, repeatable, immutable deployments of our applications and services; we've used Docker for repeatable build systems, we've also used Docker as a foundational part of the distributed job system 'SunDial' that powers our personalisation and recommendation systems. Now, as we enter Gilt's next stage in evolution as part of Hudson Bay Company (HBC), we're realising the value of having standardised containers that can be deployed easily across private cloud, public cloud, and traditional data-centre infrastructure.

The dust has settled, and now container technology is moving from early-adopters to mainstream. In this talk, I'll provide detailed examples of how we've used Docker and where container technology has given us most bang for buck and, pragmatically, what aspects of the technology haven't panned out as we thought they would.

Interview

Question:

Your talk is about containers in production. Can you provide the attendees with a bit of background for the talk?

Answer:

I have been talking a lot about microservices at Gilt during the last year. Over the last 3 or 4 years, we have gone from a monolithic application running in a standard datacenter to a cloud-based deployment on Amazon with about 300 microservices. We exploded our monolith in shatters.

I think the interesting part is the scale of what we are doing. About 3 years ago, we understood we had lots of deployment problems: we would deploy a given service with a whole bunch of other services on the same box, and use a mix of prayer and gut instinct that there was the right level CPU and resources available.

We got to a point where we were building our own cloud/container infrastructure to get the required level of isolation. We had a really big project lined up to do it; we started that and all of a sudden we realized that there was this thing called Docker, and it gave us everything that we were looking for in terms of immutability and isolation.

Then, in tandem with that, close after the Docker realisation, we had this idea of immutable deployments. Straight after that came the understanding that we needed to move to the cloud. So we had this convergence of forces - containers, the need for isolation and immutability, and a desire to move everything to the cloud.

Some of our architecture makes use of Docker, and some of it doesn’t. Some of our deployment is still RPM-based, so we are not using Docker everywhere. That’s by the nature of lagging adoption, not because of a technology choice. If you have 300 microservices, and you want to turn all of them to Docker, then you have your hands full. And you have got to ask yourself, is that the most valuable thing that you could be doing with your time? And we decided not to. So some of these services are fine in RPM, but for all new work we are using Docker. Docker has become for us the target deployment platform.

Question:

What does your deployment pipeline look like at Gilt today?

Answer:

Our problem is that we have many deployment pipelines. Because we were so early adopters, there wasn’t just one specific solution for deploying a Docker container to a machine on a cloud: at that time, ECS didn’t exist from Amazon. Docker was only getting off the ground. It was early days, and we had a decentralized approach to tooling. As a result, 7 different teams built 7 different tools. We have come to the realization that writing deployment tooling ourselves does not add any value to Gilt. It’s great fun, everyone loves building their own framework, but it is not adding any value: it doesn’t help us sell any more dresses. And now, Amazon is producing tooling that lets us do this really easily.

That’s where we are now. We are using code deploy, we are using code pipeline, we are using Cloud Formation as part of our deployment.

We have moved away from the continuous deployment dream, to a more developer initiated deployment to production, but there is a lovely sophistication there. We first deploy to a dark canary node, we being the only people who can send traffic to it. Then we upgrade. It becomes a canary release, so one of the nodes is running the new version. Then we release fully to all nodes.

Question:

Your story is a bit different than the usual container story I hear. Because of how early you were, you had to build your own tooling, and now you are adopting or evolving it. Is that accurate?

Answer:

Yes. The ecosystem did not exist at the start. Then we ended getting together and forming a single team to figure out deployment, but while that team was on a 9-month or 12-month plan to build the perfect tool for deploying Docker, all the other teams were saying “What’s the quickest way I can get something to production? Let me write a quick script here.” They were all working on their own solution, and then we ended up in a situation where each team has settled on a substandard solution.

By the time we implemented an open source, neat solution, we realized that Amazon was doing it better than we could. It was going to cost us to maintain all that tooling, and that isn’t the game we should be in. That was a real realization for us. We don’t regret the last 3 years. But we’ve landed though on what we think is a very pragmatic solution.

Question:

How does the story of containers at Gilt come through in your talk?

Answer:

I want to share our story, as a proof of existence, showing that this stuff works. Sharing what we learned along the way, the mistakes we made. This is valuable for people who are early majority, who may want to become early adopters. They can learn from our story.

Some of the learnings are counter intuitive. One of them is that we are deploying one container per Amazon instance, but part of the docker dream is that you can have multiple containers on a machine. Our workflow requires that each service has full control over the CPU. The nature of our traffic at noon every day on Gilt is if we don’t have full processor isolation, one rogue service can take down everything. We’ve seen it first hand.

As a result, we deploy each service into a docker container on it’s own virtual machine in Amazon, and that is the way we go. That wasn’t obvious from the start.

Question:

With this long history of containers in production, do you have suggestions or lessons on things like debugging with containers?

Answer:

We route all of our logs to CloudWatch. Most of our engineers don’t debug production running instances. Typically, when we are developing, we run locally, and we tunnel the rest to production. That way we can debug to a local instance. This has not been a problem for us. We have never said “We can’t debug our thing because we have deployed it under Docker!”

Question:

Have you been able to trace the same path through a container that served a request when random things happen?

Answer:

We use New Relic and that has been helpful in instrumenting all of our services. That is our primary tool for seeing what is the issue if anything happens. And using Docker hasn’t created problems in terms of not being able to use New Relic.

One of the things that is interesting is that at some stage everyone loves to just log into the machine. When you run a Docker image and you connect to it with the bash shell, that works fine. But in general, we find we don’t need to get into the Docker instance at all.

Question:

What is your view on what you have seen happening in the space over the last few years?

Answer:

We wanted immutability. We wanted to be able to deploy things that couldn’t change. What we discovered then was the right balance: the Docker container should be immutable but the AMI, the actual instance should be mutable. That was a profound result.

We began by making the AMI instances immutable: this was a bad idea as shutting down and provisioning new Amazon instances on every deploy is a slow process. It makes more sense to leave Amazon instances running: leave the instances going, but make the deployment container, the Docker piece, the immutable bit.

We learned we do not need some of the Docker tools. Docker Compose is of no use for us. We wouldn’t dream of using it. And the reason is the web of dependencies between our services, which is so complex that Docker Compose would be just useless. It wouldn’t make any sense.

We also don’t need Docker Swarm. We are just using Docker. We used to have Docker registries as part of our deploy path. But using a third party Docker registry led to an outage on our site. That was like a critical failure. When we looked at it, and we saw that the dream of creating Docker instances and putting them up on a Docker registry is a waste of time. Git is our change management tool, that’s what we use for versioning. We are using now CodeDeploy, and we are storing the images on S3 buckets, then deploying from S3. We really don’t need a Docker registry.

That is a real lesson. It’s unfortunate as well. If I am in the audience, and I am working for Docker, and I am hearing that all the tools that we are building don’t necessarily have a use for us, that is a tough message. They may have a use, but it’s probably in a niche area.

You can adopt Docker and just Docker. You don’t have to think about the wider set of tooling that every salesperson is trying to sell you, because you probably don’t need it.

Speaker: Adrian Trenaman

SVP Engineering, HBC Digital / Gilt & Commiter Apache Karaf

As SVP Engineering, HBC Digital, Ade leads the engineering and infrastructure teams for Gilt in New York and Dublin. He is an experienced, outspoken software engineer, communicator and leader with over 20 years of experience working with teams throughout Europe, US and Asia in diverse industries such as financial services, telecoms, retail, and manufacturing. In the past, he has held the positions of CTO of Gilt Japan, Tech Lead at Gilt Groupe Ireland, Distinguished Consultant at FuseSource, Progress Software and IONA Technologies, and Lecturer at the National University of Ireland in Maynooth. He became a committer for the Apache Software Foundation in 2010, has acted as an expert reviewer to the European Commission. Adrian holds a Ph.D, Computer Science from the National University of Ireland, Maynooth, a Diploma in Business Development from the Irish Management Institute, and a BA (Mod. Hons) Computer Science from Trinity College, Dublin.

Find Adrian Trenaman at

Speaker page

@adrian_trenaman

SVP Engineering at Gilt.com

Similar Talks

Learnings from a Culture First Startup

CTO @Buffer

Sunil Sadasivan

Docker Container Lifecycles – Problem or Opportunity?

Developer Advocate @JFrog

Baruch Sadogursky

Multi-Host, Multi-Network Persistent Containers

VP of Product @Aerospike

Alvin Richards

Securing Your Containers

Global Solutions Architect @Venafi

Carl Bourne

A Practitioner's Tale: Uniting Dev, Sec, and Ops Tribes

Sr. Principal Solution Architect @Sonatype

Curtis Yanko

CI/CD Pipeline-as-code with Jenkins and Docker

Principal Solution Architect @CloudBees

Kishore Bhatia

Becoming an Outlier

Software Architect @VinSolutions, Author @pluralsight

Cory House

ESPN Next Generation APIs Powering Web, Mobile, TV

Senior Director of Distribution Platforms @ESPN

Manny Pelarinos

The Human Side of Microservices

Tech Lead @Yelp

John Billings

Tracks

Monday, 13 June

Architectures You've Always Wondered About

Case studies from: Google, Linkedin, Alibaba, Twitter, and more...
Stream Processing @ Scale

Technologies and techniques to handle ever increasing data streams
Culture As Differentiator

Stories of companies and team for whom engineering culture is a differentiator - in delivering faster, in attracting better talent, and in making their businesses more successful.
Practical DevOps for Cloud Architectures

Real-world lessons and practices that enable the devops nirvana of operating what you build
Incredible Power of an Open-Sourced .NET

.NET is more than you may think. From Rx to C# 7 designed in the open, learn more about the power of open source .NET
Sponsored Solutions Track 1

Tuesday, 14 June

Better than Resilient: Antifragile

Failure is a constant in production systems, learn how to wield it to your advantage to build more robust systems.
Innovations in Java and the Java Ecosystem

Cutting Edge Java Innovations for the Real World
Modern CS in the Real World

Real-world Industry adoption of modern CS ideas
Containers: From Dev to Prod

Beyond the buzz and into the how and why of running containers in production
Security War Stories

Expert-level security track led by well known and respected leaders in the field
Sponsored Solutions Track 2

Wednesday, 15 June

Microservices and Monoliths

Practical lessons on services. Asks the question when and when to NOT go with Microservices?
Modern API Architecture - Tools, Methods, Tactics

API-based application development, and the tooling and techniques to support effectively working with APIs in the small or at scale. Using internal and external APIs
Commoditized Machine Learning

Barriers to entry for applied ML are lower than ever before, jumpstart your journey
Full Stack JavaScript

Browser, server, devices - JavaScript is everywhere
Optimizing Yourself

Keeping life in balance is always a challenge. Learning lifehacks
Sponsored Solutions Track 3

See the Full Schedule

Location:

Duration

Day of week:

Level:

Persona:

Key Takeaways

Abstract

Interview

Find Adrian Trenaman at

Similar Talks

Tracks

Monday, 13 June

Tuesday, 14 June

Wednesday, 15 June

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Presentation: How Containers Have Panned Out

Location:

Duration

Day of week:

Level:

Persona:

More talks on:

Key Takeaways

Abstract

Interview

Find Adrian Trenaman at

Similar Talks

Tracks

Monday, 13 June

Tuesday, 14 June

Wednesday, 15 June

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World