Presentation: Data Science Meets Star Wars: The F#orce Awakens

Location:

Duration

Duration: 
1:40pm - 2:30pm

Day of week:

Level:

Persona:

Key Takeaways

  • Learn universal functional language techniques for data science problem sets.
  • See how F# concepts such as type providers and integrations can improve developer productivity.
  • Understand the application of concepts like clustering coefficient and integration of R and F# in asking and solving interesting problems around Star Wars.

Abstract

Let's dive together into the the world of Star Wars! We'll use the force of F# and R to process publicly available datasets relating to the Star Wars movies to find out who's the most important character in the stories and why were the prequels so unsuccessful. On the way, you'll see why F# is a great language for data science - from preprocessing the data to visualizing them - and you'll also learn how you can use similar data processing pipelines to get interesting insights from your own data.

Interview

Question: 
Can you explain your presentation’s title a bit to me?
Answer: 
The title is “Star Wars, The F#orce Awakens”, which is obviously playing with the title of the movie and is about using F# for analyzing the Star Wars universe. I used F# to extract social networks from scripts of all the movies that are online and constructed social networks from that and then analyzed them and found some differences between the movies and who are the central people there. I will be talking about the whole process, how I did it and through that I will show some of the concepts in F# that are going to be used for analyzing data and accessing data easily.
Question: 
What is your talk about?
Answer: 
It is focused on network science. I am looking at several measures of complexity in networks because I am comparing the different movies together using measures like the clustering coefficient.. This is not about clustering although it sounds like it. If I have two friends, are they friends together as well or not? That tells me some idea about how interconnected the network is. You can compute it mathematically by looking at all pairs of neighbors and computing a ratio compared to a full graph but this tells you how people communicate in the network and you can apply the same algorithm to other data sets. For example, I have some applications in computational biology and different biological networks have different characteristics.
Question: 
Are you going to use F# to perform this analysis?
Answer: 
F# and R, and I am showing a bit how F# interoperates with R. Because the data pre-processing can be done in F# and then call R directly from F# and get results back into F# again.
Question: 
So, you are focusing on ML concepts, such as clustering coefficient, and then exposing that with F#. Is that accurate?
Answer: 
Yes. I am also talking about a few functional concepts about working with data in a functional way, using type providers in F#. Some of the patterns used in F# to work with data are very nice for parallelization because it is purely functional. The absence of side effects means you can parallelize it very easily. I am also showing how I actually parsed the script online and that shows the functional way of parsing data and doing pattern matching on regular expression.
Question: 
What are the key takeaways for this talk?
Answer: 
I want to make people aware that F# is a nice language for working with data. If you are doing any data science or machine learning, 95% of your time is spent getting data into a usable shape and I think F# is a great language for that thanks to type providers and other features that I will be showing. I also want to inspire people a bit to play with data science because that is the way to learn these concepts, to play with data. If you have nice interpretable data set like Star Wars networks, it’s easy to apply some algorithms to it and see how they work. Then they can apply them to their own data because right now, people have a lot of data on customers, or their Twitter networks. Once you know how the methods work, it is easier to apply them to something else.
Question: 
Are there topics that a non .NET developer would pick up in your talk and be able to take across to the JVM or some other platform?
Answer: 
All the data science is pretty much universal so it’s about algorithms. It’s not about the implementation. People in the JVM world can take out of it some of the functional concepts of working with data that I will be talking about.

Speaker: Evelina Gabasova

Bioinformatics Machine Learning Researcher @Cambridge_Uni

Evelina is a machine learning researcher working in bioinformatics, trying to reverse-engineer cancer at University of Cambridge. Outside of academia, she speaks at developer conferences and user groups about F# and data science. She writes a blog at http://www.evelinag.com.

Find Evelina Gabasova at

Similar Talks

Software Architect @VinSolutions, Author @pluralsight
Senior Director of Distribution Platforms @ESPN
Partner & Tech Lead @CarbonFive
Co-Founder @PredictionIO & Senior Director of Product Management @Salesforce
Lead Data Scientist @betaworks

Tracks

Monday, 13 June

Tuesday, 14 June

Wednesday, 15 June