Qconn

Data Science of Love

Location: 
Grand Ballroom - Salon A/B
Track: 
Abstract: 

eHarmony was founded to give people a better chance to find someone for a long lasting relationship. As one of the first companies we have applied advanced technology what became known as Data Science these days to the age old problem of matchmaking.
Over the years eHarmony has accumulated vast amount of data on variety of romantic interactions. This data a is a treasure trove of entertaining tidbits and nuggets of insight into human nature. I will share some of those in hope that people may find them useful but more importantly I will also demonstrate how we actually use this data to make recommendations and give single people an upper hand in finding "The One".

In particular I will show how we utilize hadoop (YARN) to process billions of pairs of user profiles t o find ngrams and other features that are predictive of romantic attraction and how we use the features discovered for large scale machine learning using vowpal wabbit's allreduce parallell learning.

Finally I am going to describe an optimization technique that decides what matches to deliver to who and when but which is more broadly aplicable to other domains such as advertising or constrained recommendations.

At eHarmony, hadoop is hard at work looking for love. It gets a hand from hive, impala, and vowpal wabbit while solving three fundamental problems:

1) Compatibility Matching - who is right for who for the long term?

2) Affinity Matching - who is attracted to who?

3) Match Distribution - who should be introduced to who and when?

You will see how parallel processing and large scale machine learning allow us to sift through free text of user profiles and find ngrams that are predictive of romantic attraction.

Did you know that hadoop is now responsible for about 5% US marriages?

Vaclav Petricek's picture
Vaclav Petricek is a Principal Data Scientist at Santa Monica-based eHarmony where he is responsible for optimization and machine learning in eHarmony's core matchmaking algorithms. He also runs a series of invited ML talks at eHarmony, part of the Los Angeles Machine Learning Meetup. Prior to eHarmony, Vaclav was Visiting Researcher at University College, London where his research spanned recommender systems, social networks, web structure and online auctions. Prior to that he has worked at several Czech internet startups. Vaclav earned his PhD in Computer Science and a Masters in Distributed Systems from Charles University in Prague, Czech Republic.