Data Science of Love
eHarmony was founded to give people a better chance to find someone for a long lasting relationship. As one of the first companies we have applied advanced technology what became known as Data Science these days to the age old problem of matchmaking.
Over the years eHarmony has accumulated vast amount of data on variety of romantic interactions. This data a is a treasure trove of entertaining tidbits and nuggets of insight into human nature. I will share some of those in hope that people may find them useful but more importantly I will also demonstrate how we actually use this data to make recommendations and give single people an upper hand in finding "The One".
In particular I will show how we utilize hadoop (YARN) to process billions of pairs of user profiles t o find ngrams and other features that are predictive of romantic attraction and how we use the features discovered for large scale machine learning using vowpal wabbit's allreduce parallell learning.
Finally I am going to describe an optimization technique that decides what matches to deliver to who and when but which is more broadly aplicable to other domains such as advertising or constrained recommendations.
At eHarmony, hadoop is hard at work looking for love. It gets a hand from hive, impala, and vowpal wabbit while solving three fundamental problems:
1) Compatibility Matching - who is right for who for the long term?
2) Affinity Matching - who is attracted to who?
3) Match Distribution - who should be introduced to who and when?
You will see how parallel processing and large scale machine learning allow us to sift through free text of user profiles and find ngrams that are predictive of romantic attraction.
Did you know that hadoop is now responsible for about 5% US marriages?