LLMs in the Real World: Structuring Text with Declarative NLP

Building machine learning pipelines to extract structured data from unstructured text is a popular problem within an unpopular development lifecycle. We’ll talk through how you can use LLMs so that your schema can `interrogate` structured data from your unstructured text data in a declarative and typesafe way. 

The challenge of converting unstructured text data into structured, usable data is a well-known adversary to engineers, analysts, and data scientists alike. In the traditional paradigm, this task has been the exclusive domain of specialists, often requiring the creation of bespoke models for each data feature. Missed a feature? Let’s circle back next quarter.

In this talk we’ll see that Large Language Models are surprisingly effective at not only rote extraction of structured data from documents, but extracting derived information and doing so in a type safe way that adheres to your data model. We’ll show how Marvin’s AI Models, grounded in Pydantic, let you interrogate your data with your data model by combining the potent reasoning capabilities of AI with the type boundaries set by Pydantic. By letting developers build NLP pipelines solely with their data model’s schema, this lets engineers and analysts enjoy a declarative development experience with NLP. 

We’ll go through real life applications of how LLMs are being used in production: structuring electronic health records data, developing custom entity extraction pipelines, generating synthetic data for test driven development, and automated schema normalization for data warehousing.


Thursday Jun 15 / 04:10PM EDT ( 50 minutes )


Dumbo / Navy Yard


From the same track

Session Search

Needle in a 930M Member Haystack: People Search AI @LinkedIn

Thursday Jun 15 / 11:50AM EDT

LinkedIn's search functionality is one of its oldest capabilities, allowing members to search for people they know, or to discover new connections.

Speaker image - Mathew Teoh
Mathew Teoh

Machine Learning @ LinkedIn

Session ML in Practice

Back to Basics: Scalable, Portable ML in Pure SQL

Thursday Jun 15 / 02:55PM EDT

Redshift has SageMaker. BigQuery begat BigML. Spark birthed Databricks. Every data warehouse is tightly coupled to a particular ML stack.

Speaker image - Evan Miller
Evan Miller

Principal Statistics Engineer @Eppo (Creator of Evan's Awesome A/B Tools)

Session AI/ML

PostgresML: Leveraging Postgres as a Vector Database for AI

Thursday Jun 15 / 10:35AM EDT

With the growing importance of AI and machine learning in modern applications, data scientists and developers are constantly exploring new and efficient ways to store and analyze large amounts of data.

Speaker image - Montana Low
Montana Low

Machine Learning w/ PostgresML

Session AI/ML

Going Beyond the Case of Black Box AutoML

Thursday Jun 15 / 01:40PM EDT

Most AutoML tools are black-box tools. They offer no code/low code tools (UI/simple APIs) for practitioners to get started quickly. While this helps beginners, most experienced data scientists/ML practitioners often need more control.

Speaker image - Kiran Kate
Kiran Kate

Senior Technical Staff Member @IBM Research