Presentation: Think Before You Tool: An Opinionated Journey



1:40pm - 2:30pm

Day of week:



Key Takeaways

  • Move behind the hype and hear things we need to get right when building services.
  • Consider a hypothetical application and understand needs/practices throughout it’s lifecycle, including concerns around building, maintaining, security, reliability, logging, and troubleshooting.
  • Jumpstart your devops pipeline by hearing concerns/solutions common for teams starting or relatively new to the space.


Continuous Integration, Continuous Deployment, Docker, Kubernetes. We've all heard about these, but there is so much more than just tooling that we need to consider when we set out to create an ecosystem of services. From the application architecture to service architecture to network architecture and all the tooling to allow us to build, test, deploy and maintain our applications and networks, this presentation will shed the light on not only some of the tooling but also on new and old ways of thinking about the creation of a distributed ecosystem.

Key takeaways: An overview of what you need to consider when choosing DevOps tools for each stage in your application's lifecycle, as well as a small example of the tools that will help at each stage. It's not all about Docker!


What is your role today?
I am a software engineer at the DigitalOcean and a tech lead on the billing team. This is my third week. Previously, I was at SoundCloud for a couple years. I joined SoundCloud New York when NY engineering was just the two of us with a bunch of monitors. SoundCloud New York is now quite a large office and quite a growing concern. It’s actually pretty awesome to see its progress.
People use DigitalOcean to build out their architecture. What does the architecture look like for a company like DigitalOcean that other people use to build their architecture?
In a broad overview, DigitalOcean looks like every other start-up in this modern day and age. It started off as a company very intent on creating a viable business. They were very focused and they used tools that worked for them really quickly.
The core of DigitalOcean is a Rails app. As time goes on a rails application becomes a pain in the butt to grow, enlarge, maintain and keep sanity within its codebase. It winds up being everything to everybody. As a result, as with every other start-up and every other company of this particular size, we are now going down the path of splitting things up into services. Not necessarily microservices, but services (big chunky bits that tend do their own independent things and have independent life cycles).
What is the main focus of your talk?
Docker is the shiny new thing and everybody is pointing at Docker as their next saviour. I am a grumpy old build monkey and to me Docker is the concept of "it works on my machine" writ large. I no longer have to take your laptop and plug it into a datacenter. I can now just take your Docker image and replicate it a million times. That’s fine. It actually works really well like that.
On the other hand, when you want to think about writing services and writing systems, there is a lot more to think about how to start. What do you actually think of? Do you actually do any planning and simulation? When you start building it, how do you actually build it? What tools are available for you to actually build, deploy, release and maintain? One of the things that we as software developers, especially application developers, really suck at is feedback - from giving feedback to taking feedback, and also putting up systems that give us automated feedback. Using users as alpha and beta testers or as canaries, and using Twitter as your ticketing system is not cool. We build applications for a reason. We should be able to track that reason. We should be able to say, this works or it doesn’t or people are doing really cool stuff with it and we should probably do a bit more of that.
How is your talk structured?
I am going to go through a hypothetical application and ask a bunch of questions. What do we need to think about when we start the application? What do we need to think about from a security standpoint? What is the tooling available on that space? How would we apply that tooling? Let’s talk about reliability and reliability patterns. What is the tooling that we are going to apply there?
Then, all the way going down to deployment and orchestrating deployment with Docker, and monitoring with Prometheus. How would that look like in your infrastructure? What do you actually need to do to make that happen?
What would you rate this talk? Beginner, intermediate or advanced?
Intermediate. People who are already advanced at this may not actually get anything new. This may give them opinions, and they may yell at me and will tell me that I am wrong.
Is this for people who already have a DevOps initiative going or some that are interested in doing DevOps?
It’s a little bit of both. If you haven’t started, this will give you an idea of the things you need to do. If you are already on the path, this may give you an idea of the things you need to talk about and think about, and maybe some things that you haven’t yet done.
Can you give me an example of something that you’ve discovered over time?
The biggest surprise from a tooling perspective was Prometheus. It is a tool that was created at SoundCloud. However, one of the things that it actually stood out at was the ability to do proper statistics-based alerts. Percent of nodes that are down. Alert on that. Without needing to go through Nagios and have to mess with its configuration and custom checks. All the Prometheus alerts are configured in your rules file.
For example, we all talk about SLOs and SLAs. How do you actually measure them? How do you actually convert that to a number that you can alert on? You can do request latencies, but we all know that percentile latencies are crock because they always remove the outliers, and anyway what do they really mean for end users? On the other hand, if you have a monitoring system that uses bucketing, which Prometheus does, you can apply your SLOs, have a 100 millisecond bucket, and see if 95% of requests get returned within 100 milliseconds. Your alert is actually a very simple mathematical formula. That was to me an eye opener in terms of monitoring and feedback and having any visibility. That was happening inside my systems. And it also becomes a powerful tool not only when talking to developers but also in talking to stakeholders, product owners, owners of companies and even vendors.
At SoundCloud we had a particular vendor who changed something inside their routing, so instead of talking to a European data center from SoundCloud’s data center in Amsterdam, we were now going all the way over the Atlantic to the US, which added 100 milliseconds. As a result, my 300 millisecond SLO was completely blown. I was serving 60% of my requests within 300 milliseconds rather than the 95% I was supposed to. You can say this to the vendor and explain to them the consequences in terms that make business sense.
What are some of the things that I can do better now that I came to your talk?
The thing I am thinking about is how to link it all together. When I started out doing this- when anybody starts doing DevOps- you start out with the usual discussions of branch vs trunk-based development, continuous delivery, continuous deployment, what tools you use, building your data pipeline, and so on. Where does Docker fit in all of that? How do I deploy Docker to production? You can spend years and countless hours and build careers and personal brands just on that, but in the end, what does that do for the end delivery of your system?
What does that do for user behavior? What does that do for service stability and reliability? Where do all the other bits and pieces come in? Everybody focuses on the mechanistic building bits. They are cool. I know they are cool. I’ve spent years doing them, but what about everything else that you need to do? What about keeping stable, secure, reliable services up and running, and talking to one another, and making sure that they are stable, secure and reliable? How do you prove that? How do we prove that the system that we have built actually does the things that we built it for?
These are the things that many tech leads come to later. I did. Maybe I am slow. It took me a long time to get over just building things. I love building stuff, and it took me a lot of time to realize that there is a whole lot more than just that.

Speaker: Tom Czarniecki

Tech Lead @DigitalOcean

I had a great hobby once; it turned into an awesome job. I get a kick out of solving problems and making stuff (that sometimes includes software), and continue to do so whether I’m paid for it or not. For the last fifteen years I’ve been helping to build beautiful, simple and highly available systems, as well as contributing to the open-source community by creating projects that scratch an itch and hopefully make someone else’s life or work a little better.

Find Tom Czarniecki at


Monday, 13 June

Tuesday, 14 June

Wednesday, 15 June