Abstract

In many ML use cases, model performance is highly dependent on the quality of the features they are trained and inference on. One of the important dimensions of feature quality is the freshness of the data. Therefore, it is critical to ensure that the features remain up-to-date to the problem being solved.

The presentation will cover the impact of feature freshness on model performance based on experiments both in training data and inference data. We will also discuss various strategies and techniques that can be used to improve feature freshness, including in streaming and batch feature processing. It will also discuss the challenges and tradeoffs that come with implementing these strategies in large scale machine learning systems, such as the computational cost and scalability issues.

By keeping the features fresh and relevant, organizations can achieve better results and stay ahead of the competition in today's rapidly evolving data-driven landscape.

Interview:

What's the focus of your work these days?

My current area of focus revolves around developing techniques to prepare data for machine learning inference on a large scale. At the same time, I aim to enhance reliability, improve efficiency, and minimize latency in the process.

What's the motivation for your talk at QCon New York 2023?

I would like to share our learnings while working on these projects with the industry.

How would you describe your main persona and target audience for this session?

The target audience would be experienced technologists in the industry who work on large scale data processing for machine learning.

Is there anything specific that you'd like people to walk away with after watching your session?

There are a few key takeaways:

Improving data freshness is becoming more and more important in ML tasks
However not all your data need to be super fresh. Optimize for ROI instead of freshness alone
Design your system end to end, instead of focusing on localized optimization

Speaker

Zhongliang Liang

Engineering Manager @Facebook AI Infra

Zhongliang has over a decade of experience working in the domain of big data and large scale distributed systems. His most recent focus is on developing advanced data infrastructure for ML data processing at Meta, which powers the SOTA recommendation systems in the industry.

Previously, Zhngliang worked at LinkedIn, Microsoft BingAds and Vertica Systems, where he worked on building distributed online and offline systems as well as high speed analytical database. Zhongliang also serves as a member of the Steering Committee for the Machine Learning Platform Meetup, where he facilitates the sharing of the latest technology advancements in the ML platform community.

Find Zhongliang Liang at:

From the same track

Session MLOps

Platform and Features MLEs, a Scalable and Product-Centric Approach for High Performing Data Products

Wednesday Jun 14 / 04:10PM EDT

In this talk, we would go through the lessons learnt in the last couple of years around organising a Data Science Team and the Machine Learning Engineering efforts at Bumble Inc.

Massimo Belloni

Data Science Manager @Bumble

Session AI/ML

A Bicycle for the (AI) Mind: GPT-4 + Tools

Wednesday Jun 14 / 02:55PM EDT

OpenAI recently introduced GPT-3.5 Turbo and GPT-4, the latest in its series of language models that also power ChatGPT.

Sherwin Wu

Technical Staff @OpenAI

Atty Eleti

Software Engineer @OpenAI

Session ML Infrastructure

Introducing the Hendrix ML Platform: An Evolution of Spotify’s ML Infrastructure

Wednesday Jun 14 / 10:35AM EDT

The rapid advancement of artificial intelligence and machine learning technology has led to exponential growth in the open-source ML ecosystem.

Divita Vohra

Senior Product Manager @Spotify

Mike Seid

Tech Lead for the ML Platform @Spotify

Session

Panel: Navigating the Future: LLM in Production

Wednesday Jun 14 / 05:25PM EDT

Our panel is a conversation that aim to explore the practical and operational challenges of implementing LLMs in production. Each of our panelists will share their experiences and insights within their respective organizations.

Sherwin Wu

Technical Staff @OpenAI

Hien Luu

Sr. Engineering Manager @DoorDash

Rishab Ramanathan

Co-founder & CTO @Openlayer

Session

Unconference: MLOps

Wednesday Jun 14 / 01:40PM EDT

What is an unconference? An unconference is a participant-driven meeting. Attendees come together, bringing their challenges and relying on the experience and know-how of their peers for solutions.

Improve Feature Freshness in Large Scale ML Data Processing

Abstract

Interview:

What's the focus of your work these days?

What's the motivation for your talk at QCon New York 2023?

How would you describe your main persona and target audience for this session?

Is there anything specific that you'd like people to walk away with after watching your session?

Speaker

Zhongliang Liang

Find Zhongliang Liang at:

Speaker

Zhongliang Liang

Date

Location

Track

Topics

Share

From the same track

Platform and Features MLEs, a Scalable and Product-Centric Approach for High Performing Data Products

A Bicycle for the (AI) Mind: GPT-4 + Tools

Introducing the Hendrix ML Platform: An Evolution of Spotify’s ML Infrastructure

Panel: Navigating the Future: LLM in Production

Unconference: MLOps

Follow QCon

Contact

Menu

Conferences around the World