What You’ll Learn

Understand the importance of developing a simulation-based framework to reasoning about machine learning models.
Hear a three-step approach to evaluating models against metrics that matter to the business.
Learn more how Opendoor uses machine learning to drive pricing models.

Abstract

American homes represent a $25 trillion asset class, with very little liquidity. Selling a home on the market takes months of hassle and uncertainty. Opendoor offers to buy houses from sellers, charging a fee for this service. Opendoor bears the risk in reselling the house and needs to understand the effectiveness of different hazard-based liquidity models.

This talk focuses on how to estimate the business impact of launching various machine learning models, in particular, those we use for modeling the liquidity of houses. For instance, if AUC increases by a certain amount, what is the likely impact on various business metrics such as volume and margin?

With the rise of machine learning, there has been a spate of work in integrating such techniques into other fields. One such application area is in econometrics and causal inference (cf. Varian, Athey, Pearl, Imbens), where the goal is to leverage advances in machine learning to better estimate causal effects. Given that a typical A/B test of a real estate liquidity model can run many months in order to fully realize resale outcomes, we use a simulation-based framework to estimate the causal impact on the business (e.g. on volume and margin) of launching a new model.

Interview

Question:

QCon: What does Opendoor do?

Answer:

Nelson: We make it as easy as possible to buy and sell houses. The way it works is you sell your house to us, and then we later sell it to a market buyer. This is pretty risky. So one of my teams focuses on modeling that risk

For example, we’re able to resell some houses quickly. Perhaps they are in more central areas of town, they're at lower price points, with larger buyer pools to draw from, or it's a favorable time of the year (like spring). It just depends on the market. In these cases, the fee that we charge the seller is quite low because we're incurring very little risk in holding the house and then later reselling it. So we have a series of machine learning models that we are using in our pricing that are focused on modeling this kind of liquidity.

Question:

QCon: What’s the motivation for your talk?

Answer:

Nelson: I think that backtesting machine learning systems is very well understood, and running A/B tests to see how a new model launch is affecting business metrics is also quite common. However, in many application areas it’s important to be able to backtest the business impact of a new machine learning model. There aren’t many resources on how to do that, and I’d love to spread awareness.

Question:

QCon: From a high-level, how will you go about discussing testing and evaluating models in this talk?

Answer:

Nelson: You need a simulation of the business. In most cases, that's a user model. So, for example in real estate, we want to know what is the probability that someone will sell to us given the price we're offering. This is a demand curve, which is generally downward sloping. You can you can add as many features as you want to make it a more accurate reflection of business. Other domains will have other user models.

So that's the main business simulation. Then you put in your predictions from a new machine learning model and the old one. The difference is the impact on the business metrics. I am going to discuss this approach and our use case and go over a three step process to generalize to new problems.

Question:

QCon: What do you want someone to walk away from your talk with?

Answer:

Nelson: You might be experimenting with some extreme change or, in our case, it might just take a very long time to get results because of what I call metric measurement lag.

If these are the case, consider this simulation-based approach and the steps I'll discuss as an approach to reason about selecting your models.

Speaker: Nelson Ray

Data Scientist @Opendoor

Nelson manages the Risk Science group at Opendoor in San Francisco. His team is responsible for per-home liquidity estimation and developing responsive risk models. Prior to joining Opendoor, Nelson was a data scientist at Google and a software engineer at Metamarkets. He holds a BS in mathematics and an MS and PhD in statistics from Stanford University.

Find Nelson Ray at

Speaker page

Similar Talks

The Effective Remote Developer

Director of Engineering

David Copeland

I Have A NoSQL toaster

Developer Advocate @Couchbase

Matthew Groves

Engineer Innovation Through Rapid Prototyping

Principal Software Engineer @ Vistaprint

Ramon Harrington

Nonconformist Resilience: DB-Backed Job Queues

VP Architecture @Betterment

John Mileham

Managing Millions of Data Services @Heroku

Senior Infrastructure Engineer @Heroku

Gabriel Enslein

Building Microservices @Squarespace

Director of Engineering @ Squarespace

Franklin Angulo

Refactor Frontend APIs & Accounting for Tech Debt

Software Engineer @Indiegogo

Julia Nguyen

Reasoning About Complex Distributed Systems

Software Engineer @Jet, previous CTO

Erich Ess

Removing Friction In the Developer Experience

SVP Engineering, HBC Digital / Gilt & Committer Apache Karaf

Adrian Trenaman

Tracks

Monday, 26 June

Microservices: Patterns & Practices

Practical experiences and lessons with Microservices.
Java - Propelling the Ecosystem Forward

Lessons from Java 8, prepping for Java 9, and looking ahead at Java 10. Innovators in Java.
High Velocity Dev Teams

Working Smarter as a team. Improving value delivery of engineers. Lean and Agile principles.
Modern Browser-Based Apps

Reactive, cross platform, progressive - webapp tech today.
Innovations in Fintech

Technology, tools and techniques supporting modern financial services.

Tuesday, 27 June

Architectures You've Always Wondered About

Case studies from the most relevant names in software.
Developer Experience: Level up Your Engineering Effectiveness

Trends, tools and projects that we're using to maximally empower your developers.
Chaos & Resilience

Failures, edge cases and how we're embracing them.
Stream Processing at Large

Rapidly moving data at scale.
Building Security Infrastructure

How our industry is being attacked and what you can do about it.

Wednesday, 28 June

Next Gen APIs: Designs, Protocols, and Evolution

Practical deep-dives into public and internal API design, tooling and techniques for evolving them, and binary and graph-based protocols.
Immutable Infrastructures: Orchestration, Serverless, and More

What's next in infrastructure. How cloud function like lambda are making their way into production.
Machine Learning 2.0

Machine Learning 2.0, Deep Learning & Deep Learning Datasets.
Modern CS in the Real World

Applied, practical, & real-world dive into industry adoption of modern CS.
Optimizing Yourself

Maximizing your impact as an engineer, as a leader, and as a person.
Ask Me Anything (AMA)

This Year's Schedule

Track: Machine Learning 2.0

Location: Liberty, 8th fl.

Duration: 11:50am - 12:40pm

Day of week: Wednesday

Level: Advanced

Persona: Data Scientist

What You’ll Learn

Abstract

Interview

Find Nelson Ray at

Similar Talks

Tracks

Monday, 26 June

Tuesday, 27 June

Wednesday, 28 June

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Presentation: Evaluating Machine Learning Models: A Case Study

Track: Machine Learning 2.0

Location: Liberty, 8th fl.

Duration: 11:50am - 12:40pm

Day of week: Wednesday

Level: Advanced

Persona: Data Scientist

More talks on:

What You’ll Learn

Abstract

Interview

Find Nelson Ray at

Similar Talks

Tracks

Monday, 26 June

Tuesday, 27 June

Wednesday, 28 June

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World