Improve Feature Freshness in Large Scale ML Data Processing

In many ML use cases, model performance is highly dependent on the quality of the features they are trained and inference on. One of the important dimensions of feature quality is the freshness of the data. Therefore, it is critical to ensure that the features remain up-to-date to the problem being solved.

The presentation will cover the impact of feature freshness on model performance based on experiments both in training data and inference data. We will also discuss various strategies and techniques that can be used to improve feature freshness, including in streaming and batch feature processing. It will also discuss the challenges and tradeoffs that come with implementing these strategies in large scale machine learning systems, such as the computational cost and scalability issues.

By keeping the features fresh and relevant, organizations can achieve better results and stay ahead of the competition in today's rapidly evolving data-driven landscape.

What's the focus of your work these days?

My current area of focus revolves around developing techniques to prepare data for machine learning inference on a large scale. At the same time, I aim to enhance reliability, improve efficiency, and minimize latency in the process.

What's the motivation for your talk at QCon New York 2023?

I would like to share our learnings while working on these projects with the industry.

How would you describe your main persona and target audience for this session?

The target audience would be experienced technologists in the industry who work on large scale data processing for machine learning. 

Is there anything specific that you'd like people to walk away with after watching your session?

There are a few key takeaways:

  • Improving data freshness is becoming more and more important in ML tasks
  • However not all your data need to be super fresh. Optimize for ROI instead of freshness alone
  • Design your system end to end, instead of focusing on localized optimization

Speaker

Zhongliang Liang

Engineering Manager @Facebook AI Infra

Zhongliang has over a decade of experience working in the domain of big data and large scale distributed systems. His most recent focus is on developing advanced data infrastructure for ML data processing at Meta, which powers the SOTA recommendation systems in the industry.

Previously, Zhngliang worked at LinkedIn, Microsoft BingAds and Vertica Systems, where he worked on building distributed online and offline systems as well as high speed analytical database. Zhongliang also serves as a member of the Steering Committee for the Machine Learning Platform Meetup, where he facilitates the sharing of the latest technology advancements in the ML platform community.

Read more
Find Zhongliang Liang at:

Date

Wednesday Jun 14 / 11:50AM EDT ( 50 minutes )

Location

Salon D (North Tower)

Topics

Machine Learning ML Platform Data Platform

Share

From the same track

Session MLOps

Platform and Features MLEs, a Scalable and Product-Centric Approach for High Performing Data Products

Wednesday Jun 14 / 04:10PM EDT

In this talk, we would go through the lessons learnt in the last couple of years around organising a Data Science Team and the Machine Learning Engineering efforts at Bumble Inc.

Massimo Belloni

Data Science Manager @Bumble

Session AI/ML

Building Production AI-Powered Applications with the OpenAI API and Plugins

Wednesday Jun 14 / 02:55PM EDT

We recently introduced Chat Completions into the OpenAI API – which currently powers the GPT-4 and ChatGPT APIs.

Sherwin Wu

Technical Staff @OpenAI

Atty Eleti

Software Engineer @OpenAI

Session ML Infrastructure

Introducing the Hendrix ML Platform: An Evolution of Spotify’s ML Infrastructure

Wednesday Jun 14 / 10:35AM EDT

The rapid advancement of artificial intelligence and machine learning technology has led to exponential growth in the open-source ML ecosystem.

Divita Vohra

Senior Product Manager @Spotify

Mike Seid

Tech Lead for the ML Platform @Spotify

Session

Panel: MLOps - Production & Delivery for ML Platforms

Wednesday Jun 14 / 05:25PM EDT

Details coming soon.

Session

Unconference: MLOps

Wednesday Jun 14 / 01:40PM EDT

What is an unconference? An unconference is a participant-driven meeting. Attendees come together, bringing their challenges and relying on the experience and know-how of their peers for solutions.