Track:

Location:

Salon C

Duration

Duration:

5:25pm - 6:15pm

Day of week:

Wednesday

Level:

Intermediate

Persona:

Data Scientist

Key Takeaways

Understand how to overcome technological bottleneck with Machine Learning to scale as a startup.
Hear the case for Innovation vs. Invention with ML systems and when to choose one over the other.
Understand the importance and how to adapt your machine learning solution to your organization.

Abstract

The impact of machine learning solutions hinge on three entities working in cadence: data, software systems and humans-in-the-loop. At Betaworks, there are different companies/projects in different markets and in different stages of their growth cycle. The data team must work with natural language and news data, audio signals, gifs, images and videos, gaming data, very large social graphs and weather data - driving and supporting vastly disparate plus continuously evolving requirements. Naturally, the rhythm of all three entities requires continuous calibration to achieve synergy. ML efforts oscillate between fast and slow phases of analysis, modeling, planning, building, deployment, evaluation and tuning. This talk discusses some of our internal data tools and platform, product-specific solutions and best practices we learned when machine learning has to drive the startup road.

Interview

Question:

What is your role today?

Answer:

I am the lead data scientist at betaworks (a New York City based technology company that operates as a studio). We build companies in-house and also invest in early stage startups. My role involves building data features for our products (like digg and instapaper), helping early-stage companies scale using data/machine learning, and also assisting the investment team evaluate the data potential of prospective seed investments.

Question:

Can you explain your talk title to me?

Answer:

At betaworks, we are in an unique position to be able to work with different kinds of data due to the diversity of our portfolio companies, including weather, audio, video, natural language text, gifs, images, gaming data etc. Our Machine Learning solutions have to adapt to time, human and product technology constraints at startup pace. Many times, these solutions become the integral factor in scaling a company. Therefore, ML efforts must oscillate between fast and slow phases of analysis, modeling, planning, building, deployment, evaluation and tuning - so we can streamline ML with other parts of the organization and build synergy.

Question:

How would you rate this talk: Beginner, Intermediate, or Advanced

Answer:

Intermediate, partly because I am not going to delve into excessive details about how ML models can misinterpret data and overfit or how there can be bias in data which makes predictions go off. Instead, I will focus more on how to apply data science in an organization (especially startups) with minimal friction yet produce serious impact. This talk will help someone who has done some data science and realizes two things: (1) what they teach you in academia about machine learning is very different from actually implementing a solution in the industry. (2) even with years of experience in “doing data science”, product and data platform strategies can advance or throttle the impact of machine learning solutions.

Question:

What’s the motivation for your talk?

Answer:

It's a complex challenge to adjust machine learning to product, customer, team and technological constraints in the real world. These are things they don’t really teach you when you learn the math behind machine learning. Folks from different backgrounds - data scientists, machine learning engineers, statisticians, backend engineers work together to build a ML solution and many times, the synergy is actually hindered by the different sort of constraints and priorities these people have. Startups evolve very fast. In the process of moving fast, you don’t want to just deploy a model without sufficient forethought - because your initial product features can be deal breakers.

So you have this fast phase with ML where you have to implement, deploy and test before the runaway expires. You also have the slow phases where things which need to be built and modeled should be done with minimal technical debt. You have to take care of both phases and oscillate necessarily for the plan to be successful.

People who come from different backgrounds into machine learning have different ideas about how to achieve this in the real world. A backend engineer versus the data scientist who comes from computer science vs. the statistician who comes from economics - each has very different ideas about how this solution process should work. At betaworks, data scientists work closely at the product level - with designers, developers, engineers and hackers, which isn’t an unfamiliar scenario to many startups. Sometimes, a bigger challenge than the accuracy of your model is synergizing all the help that is tasked to consolidate a model into the product.

Question:

Is your talk about culture then? Are you talking about adapting to an organization's DNA?

Answer:

I would say it’s partly about culture and partly about how to handle the natural evolution of ML solutions architecture. The abstract on my talk mentions that ML solutions have quick impact when the priorities of all three entities are optimized, i.e. the mathematical model works, the humans building it synergize, and it relates with the organization’s strategy. You have to have these three come together to actually make a solution work. It's much harder than it sounds, and it's hardly talked about. There are also some policy problems to handle, like when does better data outperform a more elegant yet complex model. Data purists might care overwhelmingly about elegance in modeling. Yet, given the time constraints in startups, the most elegant model might not be most apt model to choose. So there are a bunch of factors like these I want to discuss and experiences we've had with startup products.

Question:

How important is machine learning to early startup companies these days?

Answer:

I think if you are making a simple app, then sometimes it isn’t very important initially. But a lot of times, simplicity tends to hide the actual complexity behind a product. I will give you an example. We have a company called Poncho. Poncho is this cat who sends you texts messages every morning and evening about the weather, but in like a really funny and friendly way. Editors write the messages for the consumers. But behind Poncho’s friendly message interface, there is massive clustering of the weather patterns across all US zip codes. So that requires a lot of machine learning. And when Poncho started to scale from the New York area to other parts of the country (its national now), it was challenging to actually process that much information in real time. We had to calculate these clusters of geo locations that have similar patterns using weather time series data and then ask editors to write one message per cluster. So that is an example when machine learning is absolutely critical in scaling a startup. Depending on the company, sometimes the core technology is strongly machine learning heavy. And yet, it might be behind a veil of simplicity.

Question:

The last sentence in your abstract says that your talk discusses some of Betaworks internal tools. What do you mean when you say internal tools? Are these custom proprietary tools you are talking about or what does that mean?

Answer:

Betaworks has kind of a unique scenario: it deals with different kinds of data, like weather data, national language, audio. So we built sort of a centralized system that helps us process this data and generate features which we can reuse based on media types and semantics. The goal of this nexus is to do the machine learning at our end and then send back the solved result to the teams or to the companies via pipelines and APIs.

The motivations around a centralized ML architecture and feature reuse or cross-pollination between products is rooted in the fact that machine learning grows more powerful with transfer learning and its needs to be abstracted from the product. Technical debt in ML engineering can be harder to resolve than product engineering. A good ML architecture will allow you to quickly build, deploy and test machine learning models with flexible coupling with the product. Reducing the friction in getting from ipython-notebooks to a production system could be priceless.

Question:

QCon targets advanced architects and sr development leads, what do you feel will be the actionable benefits that type of persona will walk away from your talk with?

Answer:

If you are thinking of deploying or already having ML systems running in your company, what are the key facts you should know when building and interacting with such systems or with people that run them? What are the capabilities, limits and evolution patterns of such systems? When should you move fast vs. move cautiously around a ML solution?

Question:

What do you feel is the most disruptive tech in IT right now?

Answer:

Well, one of the most disruptive things in tech a few years ago might have been cloud based machine learning systems. And that was like a year or so ago. I think Lambda on AWS is a disruptive tech because it can help you do event based computing - which is huge for ML systems feeding off streaming data. But since this is a ML track, the one I feel most strongly about right now is deep learning.

The reason deep learning is so interesting is because there are a bunch of problems which had taken forever for computers to solve. Deep learning solves many of these with incredible accuracy. The only issue with deep learning systems is you have to design it well and its somewhat compute heavy. I think solutions based on it will be slowly percolating into a bunch of consumer tech pretty soon.

Speaker: Suman Deb Roy

Lead Data Scientist @betaworks

Suman Deb Roy is a computer scientist and the author of 'Social Multimedia Signals: A Signal Processing Approach to Social Network Phenomena'. He currently works as the Lead Data Scientist in NY-based startup studio betaworks, and has previously been with Microsoft Research and as a Fellow at the Missouri School of Journalism. He is the recipient of the IEEE Communications Society MMTC Best Journal Paper Award in 2015 and the Missouri Honor Medal for Outstanding PhD Research in 2013. Suman also serves as the Editor of IEEE Special Technical Community on Social Networking.

Find Suman Deb Roy at

Speaker page

@_RoySD

Lead Data Scientist at betaworks

Similar Talks

Learnings from a Culture First Startup

CTO @Buffer

Sunil Sadasivan

Day in the Life with Speech Recognition, Machine Learning, and IOT

Distinguished Engineer, Emerging Technology @IBM

Mark Vanderwiele

Building Cognitive Applications

Developer Advocate for IBM Emerging Technology

Jonathan Kaufman

Becoming an Outlier

Software Architect @VinSolutions, Author @pluralsight

Cory House

ESPN Next Generation APIs Powering Web, Mobile, TV

Senior Director of Distribution Platforms @ESPN

Manny Pelarinos

The Human Side of Microservices

Tech Lead @Yelp

John Billings

The Seven (More) Deadly Sins of Microservices

Chief Scientist @OpenCredo

Daniel Bryant

Lessons Learned on Uber's Journey into Microservices

Software Engineer @Uber

Emily Reinhold

What They Don’t Tell You About Microservices…

CTO @Yodle

Daniel Rolnick

Tracks

Monday, 13 June

Architectures You've Always Wondered About

Case studies from: Google, Linkedin, Alibaba, Twitter, and more...
Stream Processing @ Scale

Technologies and techniques to handle ever increasing data streams
Culture As Differentiator

Stories of companies and team for whom engineering culture is a differentiator - in delivering faster, in attracting better talent, and in making their businesses more successful.
Practical DevOps for Cloud Architectures

Real-world lessons and practices that enable the devops nirvana of operating what you build
Incredible Power of an Open-Sourced .NET

.NET is more than you may think. From Rx to C# 7 designed in the open, learn more about the power of open source .NET
Sponsored Solutions Track 1

Tuesday, 14 June

Better than Resilient: Antifragile

Failure is a constant in production systems, learn how to wield it to your advantage to build more robust systems.
Innovations in Java and the Java Ecosystem

Cutting Edge Java Innovations for the Real World
Modern CS in the Real World

Real-world Industry adoption of modern CS ideas
Containers: From Dev to Prod

Beyond the buzz and into the how and why of running containers in production
Security War Stories

Expert-level security track led by well known and respected leaders in the field
Sponsored Solutions Track 2

Wednesday, 15 June

Microservices and Monoliths

Practical lessons on services. Asks the question when and when to NOT go with Microservices?
Modern API Architecture - Tools, Methods, Tactics

API-based application development, and the tooling and techniques to support effectively working with APIs in the small or at scale. Using internal and external APIs
Commoditized Machine Learning

Barriers to entry for applied ML are lower than ever before, jumpstart your journey
Full Stack JavaScript

Browser, server, devices - JavaScript is everywhere
Optimizing Yourself

Keeping life in balance is always a challenge. Learning lifehacks
Sponsored Solutions Track 3

See the Full Schedule

Location:

Duration

Day of week:

Level:

Persona:

Key Takeaways

Abstract

Interview

Find Suman Deb Roy at

Similar Talks

Tracks

Monday, 13 June

Tuesday, 14 June

Wednesday, 15 June

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Presentation: Machine Learning Fast and Slow

Location:

Duration

Day of week:

Level:

Persona:

More talks on:

Key Takeaways

Abstract

Interview

Find Suman Deb Roy at

Similar Talks

Tracks

Monday, 13 June

Tuesday, 14 June

Wednesday, 15 June

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World