Shifting workloads from synchronous to asynchronous can simplify the operational cost of high-throughput HTTP services. But understanding the evolution of performance metrics in the world of complex, high-concurrency, asynchronous distributed systems can be quite challenging.
In this talk, I'll tell you how OneSignal improved the performance and maintainability of its highest-throughput HTTP endpoints (backed by a Kafka consumer in Rust) by making it an asynchronous system. We will cover:
- How metrics changed when the system went from sync to async
- Unique sharding strategies to maximize concurrency and performance, while maintaining consistency for Kafka consumers
- System-level constraints from Postgres infrastructure determining the Kafka scaling strategy.
Engineering Manager @OneSignal
Lily Mara is an Engineering Manager at OneSignal in San Mateo, CA. She manages the Infrastructure Services team, which is responsible for in-house services used by other OneSignal engineering teams. Previously she was a software engineer at OneSignal, leading the efforts to create OneSignal's integration with Mixpanel, develop the outcomes system, and improving performance and code simplicity through refactoring efforts. Lily also worked as a software developer at Kroger, working on Kroger’s online grocery ordering system as well as internal development tools to aid other teams in deployments, monitoring, and local development environments.
Lily is the author of Refactoring to Rust, an early-access book by Manning Publications about improving the performance of existing software systems through the gradual addition of Rust code.