Presentation: Machine-Learned Indexes - Research from Google
Share this on:
Abstract
Modern data processing systems are designed to be general purpose, in that they can handle a wide variety of different schemas, data types, and data distributions, and aim to provide efficient access and computation over this data. This “one-size-fits-all” nature results in systems that do not take advantage of the unique characteristics of each application, data of the user, or workload. However, ignored in these old systems’ design: machine learning excels at understanding and adapting to particular datasets. We present here a vision (with evidence) for the future of data processing systems: through learning models of the application, data, and workload, we can redesign and customize nearly every component of data processing systems. We will do a deep-dive into understanding how traditional index structures can be reframed as machine learning problems, and that by doing so, and through careful model design and code synthesis, we are able to outperform cache-optimized B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over several real-world data sets. Building on these same modeling techniques, we find that we can achieve improvements in sorting, multi-dimensional indexing, and query optimization, all areas that have historically been the domain of traditional discrete algorithms and complex systems engineering.
What is the focus of your work today?
My research focuses on machine learning and data mining applications, in particular ML for data systems, machine learning fairness, and recommender systems.
What’s the motivation for this talk?
This talk focuses on recent research on using machine learning algorithms to improve traditional data processing systems. In particular, we’ve done research recently on how machine learned models can improve traditional data structures, which has exciting implications for databases and other core components of computer science
How would you describe the persona and level of the target audience?
This talk is geared toward both researchers and engineers exploring how machine learning and data mining can interact with traditional computer engineering and infrastructure challenges.
What do you want this persona to walk away from your talk with?
My hope is that folks will leave the talk with a new perspective on traditional computer science problems and a better understanding of when it may be beneficial to frame some tasks as machine learning tasks.
Tracks
-
Java - The Interesting Bits
Learn the new features in the recent and near-future releases of Java and the JVM and what they offer.
-
Ethical Considerations in Consciously Designed Software
Design considerations for various contexts, locations, security and privacy requirements.
-
Operating Microservices
Learn from practitioners operating and evolving systems in performance demanding environments.
-
Shift-Left Cybersecurity: Developer Accountability for Security
Learn how to make security an inherent part of the software development process.
-
Native Compilation Is Back (A Look at Non-Vm Compilation Targets)
Issues with native compilation for in browser-based and server-side environments
-
Troubleshooting in Production
Learn debugging strategies for complex and high stakes environments where standard debuggers and profilers fail.
-
Predictive Architectures and ML
Learn about cutting-edge ML applications and their underlying architectures.
-
Mission Critical Data Engineering
Explore a variety of data engineering use-cases and applications
-
Production Readiness
Observability, emergency response, capacity planning, release processes, and SLOs for availability and latency.
-
Humane Leadership
A look at leadership with an emphasis on empathy, taking chances and building other leaders within organizations and teams
-
Developer Experience: The Art and Science of Reducing Friction
Explore how to reduce developer friction between teams and stakeholders.
-
Blameless Culture
Absorb the lessons learned from failures and outages in a human-centric process.
-
Modern Computer Science in the Real World
Learn how companies are applying recent CS research to tackle concurrency, distributed data, and coordination.
-
Architectures You’ve Always Wondered About
Join companies like Google, Netflix, Bloomberg, BBC, and more as they share an inside glimpse on their next-gen architectures and challenges of delivering at massive scale.
-
Bare Knuckle Performance
Learn from practitioners on the challenges and benefits of architecting for performance and much more.