Going Beyond the Case of Black Box AutoML

Most AutoML tools are black-box tools. They offer no code/low code tools (UI/simple APIs) for practitioners to get started quickly. While this helps beginners, most experienced data scientists/ML practitioners often need more control. Building a predictive model is an iterative process, so such restricted behavior of AutoML tools leads to the limited use of it. Keeping the real-life data scientist in mind, we created a programming model called “Gradual AutoML”. It borrows some concepts from functional programming and addresses the entire spectrum of controlled automation. Gradual AutoML allows the data scientist to be in the driver’s seat and use AutoML for assisted driving.

This talk will cover the basics of AutoML and then present Lale (https://github.com/IBM/lale), an open-source scikit-learn compatible AutoML library which implements Gradual AutoML. It will include usage examples and code showing how ML practitioners can control certain choices and employ AutoML to do the rest. I will also briefly share how to use Lale for AutoML with imbalance correction, computation of fairness metrics and bias mitigation. The talk assumes some familiarity with the Python ML ecosystem, but many of the concepts apply to the general AutoML framework

What's the focus of your work these days?

I work on AI research. The session I'm going to conduct is on AutoML or AutoAI. Right now, I'm working on AutoAI with foundation models in mind, which are the large language models, the latest in AI.

What's the motivation for your talk at QCon New York 2023?

Most of the commercial or open-source AutoML tools today are black-box tools. For data scientists or ML practitioners who want to use the optimization techniques that AutoML provides, they have a very black-box interface. They can give their data, tasks, and maybe some other hyperparameters, but that's it. What we want to achieve is to give more control to the data scientists, so they can inject their domain knowledge and intuition into the AutoAI process. Instead of being a black-box tool, they can have control and provide algorithm choices, hyperparameters, or even the search space. This way, they can try out an iterative process for AutoAI.

How would you describe your main persona and target audience for this session?

Ideally, I think I would expect them to have some knowledge of using ML, and it would be even better if they have knowledge of the Python ecosystem for machine learning, which includes open-source libraries like Pandas and scikit-learn. If they have used AutoAI, that's great, but if they haven't, I would cover the basics of what it means to take assistance from AutoAI.

Is there anything specific that you'd like people to walk away with after watching your session?

Yes, I would like them to walk away with the understanding that AutoAI is not daunting or a black-box. They can control a lot of things and even perform complex tasks with it. They can drive how it searches and uses optimization. If they want to tackle complex use cases like imbalance correction or fairness mitigation, that's all possible. They should use the right tool to leverage that.


Speaker

Kiran Kate

Senior Technical Staff Member @IBM Research

Kiran is a Senior Technical Staff Member working in the AI Programming Models department at IBM Research. She has been working in ML/AI for the past 13+ years and has built several solutions and frameworks using machine learning. She has published in top AI conferences and has filed patents in this area. Kiran has a master’s in computer science from Indian Institute of Technology, Madras.

Read more

Date

Thursday Jun 15 / 01:40PM EDT ( 50 minutes )

Location

Dumbo / Navy Yard

Topics

AI/ML AutoML Open Source Fairness

Share

From the same track

Session Search

Needle in a 930M Member Haystack: People Search AI @LinkedIn

Thursday Jun 15 / 11:50AM EDT

LinkedIn's search functionality is one of its oldest capabilities, allowing members to search for people they know, or to discover new connections.

Speaker image - Mathew Teoh
Mathew Teoh

Machine Learning @ LinkedIn

Session ML in Practice

Back to Basics: Scalable, Portable ML in Pure SQL

Thursday Jun 15 / 02:55PM EDT

Redshift has SageMaker. BigQuery begat BigML. Spark birthed Databricks. Every data warehouse is tightly coupled to a particular ML stack.

Speaker image - Evan Miller
Evan Miller

Principal Statistics Engineer @Eppo (Creator of Evan's Awesome A/B Tools)

Session AI/ML

PostgresML: Leveraging Postgres as a Vector Database for AI

Thursday Jun 15 / 10:35AM EDT

With the growing importance of AI and machine learning in modern applications, data scientists and developers are constantly exploring new and efficient ways to store and analyze large amounts of data.

Speaker image - Montana Low
Montana Low

Machine Learning w/ PostgresML

Session

LLMs in the Real World: Structuring Text with Declarative NLP

Thursday Jun 15 / 04:10PM EDT

Building machine learning pipelines to extract structured data from unstructured text is a popular problem within an unpopular development lifecycle.

Speaker image - Adam Azzam
Adam Azzam

AI Product Lead @Prefect