Presentation: Nonconformist Resilience: DB-Backed Job Queues
What You’ll Learn
- Discover the hidden complexity implicit in common message-bus-based approaches to background work.
- Reset expectations of what your platform can bring to correctness and resilience at high velocity and team scale.
- Understand the qualities that might make a database-backed job queue right for your next app.
Abstract
Resilience in the face of chaos is a tall order. As a vertically integrated financial institution where rapidly delivered features with complete data consistency and scrupulous correctness are all non-negotiable, Betterment had its work cut out for it. So we moved the goalposts - inward. By eliminating complexity that many teams consider table stakes, we’ve built a distributed software ecosystem that empowers engineers to do their best work with a minimum of high-wire distributed systems thinking.
One of the complexity-obliterating weapons in our arsenal is our approach to background work. I’ll present how we use, deploy, and even love Delayed::Job (yes, a database-backed job queue) at Betterment for its transactional enqueue semantics, safe retry with exponential backoff, and its storage model, which lends itself to simple but powerful SLA-based monitoring and alerting. DJ enables engineers to pour their creativity into their features and get resilience by default.
Interview
I lead software architecture at Betterment, which means I work with people throughout Betterment’s engineering team, keeping apprised of new developments and challenges throughout the org, sharing and cross-pollinating best practices and a shared vision for our platform, and regularly diving deep into the code alongside domain owners
John: A lot of companies end up selecting patterns based on industry norms, but sometimes the accepted patterns have rough edges that may permanently leak into your app layer causing pain. There’s a strong sense in the industry currently that you should never use a database as a job queue, instead delegating to a product that’s called a queue. And there are valid reasons to prefer a dedicated queue, but there are also reasons not to, which often get short shrift. At Betterment, we build a suite of products that people rightly care a great deal about the correctness and consistency of, and folks don’t generally realize that when you coordinate across two datastores (which a queue is) how hard a problem it is to perform a transaction that also enqueues background work, and then ensure that that background work definitely gets worked if-and-only-if the transaction commits. Many folks will end up addressing the edge cases in their business logic on a per-feature basis rather than simply eliminating the problem by unfashionably using the database as a work queue.
There will definitely be pushback from some folks on the basis of scalability and throughput - and those are real concerns for some applications, but certainly not all, and in many cases, there are other levers you should be thinking about pulling to alleviate those concerns rather than switching jobs to a dedicated queue. I’ll be presenting an honest warts-and-all accounting of the tradeoffs so that ideally folks in the room can apply them to their distinctive problem spaces and come away with better outcomes, fully aware of the pros and cons of the choices they make.
John: Engineers building new platforms or evaluating technology for future revs of their platforms would be the sweet spot. Background work is something that most grown-up apps need to perform, and there doesn’t seem to be much info out there about the pros and cons of different approaches.
Similar Talks
Tracks
Monday, 26 June
-
Microservices: Patterns & Practices
Practical experiences and lessons with Microservices.
-
Java - Propelling the Ecosystem Forward
Lessons from Java 8, prepping for Java 9, and looking ahead at Java 10. Innovators in Java.
-
High Velocity Dev Teams
Working Smarter as a team. Improving value delivery of engineers. Lean and Agile principles.
-
Modern Browser-Based Apps
Reactive, cross platform, progressive - webapp tech today.
-
Innovations in Fintech
Technology, tools and techniques supporting modern financial services.
Tuesday, 27 June
-
Architectures You've Always Wondered About
Case studies from the most relevant names in software.
-
Developer Experience: Level up Your Engineering Effectiveness
Trends, tools and projects that we're using to maximally empower your developers.
-
Chaos & Resilience
Failures, edge cases and how we're embracing them.
-
Stream Processing at Large
Rapidly moving data at scale.
-
Building Security Infrastructure
How our industry is being attacked and what you can do about it.
Wednesday, 28 June
-
Next Gen APIs: Designs, Protocols, and Evolution
Practical deep-dives into public and internal API design, tooling and techniques for evolving them, and binary and graph-based protocols.
-
Immutable Infrastructures: Orchestration, Serverless, and More
What's next in infrastructure. How cloud function like lambda are making their way into production.
-
Machine Learning 2.0
Machine Learning 2.0, Deep Learning & Deep Learning Datasets.
-
Modern CS in the Real World
Applied, practical, & real-world dive into industry adoption of modern CS.
-
Optimizing Yourself
Maximizing your impact as an engineer, as a leader, and as a person.
-
Ask Me Anything (AMA)