Presentation: Machine Learning Fast and Slow



5:25pm - 6:15pm

Day of week:



Key Takeaways

  • Understand how to overcome technological bottleneck with Machine Learning to scale as a startup.
  • Hear the case for Innovation vs. Invention with ML systems and when to choose one over the other.
  • Understand the importance and how to adapt your machine learning solution to your organization.


The impact of machine learning solutions hinge on three entities working in cadence: data, software systems and humans-in-the-loop. At Betaworks, there are different companies/projects in different markets and in different stages of their growth cycle. The data team must work with natural language and news data, audio signals, gifs, images and videos, gaming data, very large social graphs and weather data - driving and supporting vastly disparate plus continuously evolving requirements. Naturally, the rhythm of all three entities requires continuous calibration to achieve synergy. ML efforts oscillate between fast and slow phases of analysis, modeling, planning, building, deployment, evaluation and tuning. This talk discusses some of our internal data tools and platform, product-specific solutions and best practices we learned when machine learning has to drive the startup road.


What is your role today?
I am the lead data scientist at betaworks (a New York City based technology company that operates as a studio). We build companies in-house and also invest in early stage startups. My role involves building data features for our products (like digg and instapaper), helping early-stage companies scale using data/machine learning, and also assisting the investment team evaluate the data potential of prospective seed investments.
Can you explain your talk title to me?
At betaworks, we are in an unique position to be able to work with different kinds of data due to the diversity of our portfolio companies, including weather, audio, video, natural language text, gifs, images, gaming data etc. Our Machine Learning solutions have to adapt to time, human and product technology constraints at startup pace. Many times, these solutions become the integral factor in scaling a company. Therefore, ML efforts must oscillate between fast and slow phases of analysis, modeling, planning, building, deployment, evaluation and tuning - so we can streamline ML with other parts of the organization and build synergy.
How would you rate this talk: Beginner, Intermediate, or Advanced
Intermediate, partly because I am not going to delve into excessive details about how ML models can misinterpret data and overfit or how there can be bias in data which makes predictions go off. Instead, I will focus more on how to apply data science in an organization (especially startups) with minimal friction yet produce serious impact. This talk will help someone who has done some data science and realizes two things: (1) what they teach you in academia about machine learning is very different from actually implementing a solution in the industry. (2) even with years of experience in “doing data science”, product and data platform strategies can advance or throttle the impact of machine learning solutions.
What’s the motivation for your talk?
It's a complex challenge to adjust machine learning to product, customer, team and technological constraints in the real world. These are things they don’t really teach you when you learn the math behind machine learning. Folks from different backgrounds - data scientists, machine learning engineers, statisticians, backend engineers work together to build a ML solution and many times, the synergy is actually hindered by the different sort of constraints and priorities these people have. Startups evolve very fast. In the process of moving fast, you don’t want to just deploy a model without sufficient forethought - because your initial product features can be deal breakers.
So you have this fast phase with ML where you have to implement, deploy and test before the runaway expires. You also have the slow phases where things which need to be built and modeled should be done with minimal technical debt. You have to take care of both phases and oscillate necessarily for the plan to be successful.
People who come from different backgrounds into machine learning have different ideas about how to achieve this in the real world. A backend engineer versus the data scientist who comes from computer science vs. the statistician who comes from economics - each has very different ideas about how this solution process should work. At betaworks, data scientists work closely at the product level - with designers, developers, engineers and hackers, which isn’t an unfamiliar scenario to many startups. Sometimes, a bigger challenge than the accuracy of your model is synergizing all the help that is tasked to consolidate a model into the product.
Is your talk about culture then? Are you talking about adapting to an organization's DNA?
I would say it’s partly about culture and partly about how to handle the natural evolution of ML solutions architecture. The abstract on my talk mentions that ML solutions have quick impact when the priorities of all three entities are optimized, i.e. the mathematical model works, the humans building it synergize, and it relates with the organization’s strategy. You have to have these three come together to actually make a solution work. It's much harder than it sounds, and it's hardly talked about. There are also some policy problems to handle, like when does better data outperform a more elegant yet complex model. Data purists might care overwhelmingly about elegance in modeling. Yet, given the time constraints in startups, the most elegant model might not be most apt model to choose. So there are a bunch of factors like these I want to discuss and experiences we've had with startup products.
How important is machine learning to early startup companies these days?
I think if you are making a simple app, then sometimes it isn’t very important initially. But a lot of times, simplicity tends to hide the actual complexity behind a product. I will give you an example. We have a company called Poncho. Poncho is this cat who sends you texts messages every morning and evening about the weather, but in like a really funny and friendly way. Editors write the messages for the consumers. But behind Poncho’s friendly message interface, there is massive clustering of the weather patterns across all US zip codes. So that requires a lot of machine learning. And when Poncho started to scale from the New York area to other parts of the country (its national now), it was challenging to actually process that much information in real time. We had to calculate these clusters of geo locations that have similar patterns using weather time series data and then ask editors to write one message per cluster. So that is an example when machine learning is absolutely critical in scaling a startup. Depending on the company, sometimes the core technology is strongly machine learning heavy. And yet, it might be behind a veil of simplicity.
The last sentence in your abstract says that your talk discusses some of Betaworks internal tools. What do you mean when you say internal tools? Are these custom proprietary tools you are talking about or what does that mean?
Betaworks has kind of a unique scenario: it deals with different kinds of data, like weather data, national language, audio. So we built sort of a centralized system that helps us process this data and generate features which we can reuse based on media types and semantics. The goal of this nexus is to do the machine learning at our end and then send back the solved result to the teams or to the companies via pipelines and APIs.
The motivations around a centralized ML architecture and feature reuse or cross-pollination between products is rooted in the fact that machine learning grows more powerful with transfer learning and its needs to be abstracted from the product. Technical debt in ML engineering can be harder to resolve than product engineering. A good ML architecture will allow you to quickly build, deploy and test machine learning models with flexible coupling with the product. Reducing the friction in getting from ipython-notebooks to a production system could be priceless.
QCon targets advanced architects and sr development leads, what do you feel will be the actionable benefits that type of persona will walk away from your talk with?
If you are thinking of deploying or already having ML systems running in your company, what are the key facts you should know when building and interacting with such systems or with people that run them? What are the capabilities, limits and evolution patterns of such systems? When should you move fast vs. move cautiously around a ML solution?
What do you feel is the most disruptive tech in IT right now?
Well, one of the most disruptive things in tech a few years ago might have been cloud based machine learning systems. And that was like a year or so ago. I think Lambda on AWS is a disruptive tech because it can help you do event based computing - which is huge for ML systems feeding off streaming data. But since this is a ML track, the one I feel most strongly about right now is deep learning.
The reason deep learning is so interesting is because there are a bunch of problems which had taken forever for computers to solve. Deep learning solves many of these with incredible accuracy. The only issue with deep learning systems is you have to design it well and its somewhat compute heavy. I think solutions based on it will be slowly percolating into a bunch of consumer tech pretty soon.

Speaker: Suman Deb Roy

Lead Data Scientist @betaworks

Suman Deb Roy is a computer scientist and the author of 'Social Multimedia Signals: A Signal Processing Approach to Social Network Phenomena'. He currently works as the Lead Data Scientist in NY-based startup studio betaworks, and has previously been with Microsoft Research and as a Fellow at the Missouri School of Journalism. He is the recipient of the IEEE Communications Society MMTC Best Journal Paper Award in 2015 and the Missouri Honor Medal for Outstanding PhD Research in 2013. Suman also serves as the Editor of IEEE Special Technical Community on Social Networking.

Find Suman Deb Roy at


Monday, 13 June

Tuesday, 14 June

Wednesday, 15 June