warning icon QCon New York 2020 has been canceled. See our current virtual and in-person events.
You are viewing content from a past/completed QCon

Presentation: Hands-On Feature Engineering for Natural Language Processing

Track: Machine Learning for Developers

Location: Soho Complex, 7th fl.

Duration: 4:10pm - 5:00pm

Day of week: Monday

Slides: Download Slides

Share this on:

This presentation is now available to view on InfoQ.com

Watch video with transcript


Think of Grammarly, Autotext and Alexa, as many applications in software engineering are full of natural language, the opportunities are endless. The latest advances in NLP such as Word2vec, GloVe, ELMo and BERT are easily accessible through open source Python libraries. There is no better time for software engineers to develop NLP applications.

Feature Engineering is the secret source to creating robust NLP models, because features are the input parameters for NLP algorithms. These NLP algorithms generate output based on the input features.

The aim of this talk is to share various NLP feature engineering techniques from Bag-Of-Words to TF-IDF to word embedding, that includes feature engineering for ML models as well as feature engineering for emerging deep learning approach.

The talk will cover the end-to-end details including contextual and linguistic feature extraction, vectorization, n-grams, topic modeling, named entity resolution which are based on concepts from mathematics, information retrieval and natural language processing. We will also be diving into more advanced feature engineering strategies such as word2vec, GloVe and fastText that leverage deep learning models.

In addition, attendees will learn how to combine NLP features with numeric and categorical features and analyze the feature importance from the resulting models.

The following libraries will be used to demonstrate the aforementioned feature engineering techniques: spaCy, Gensim, fasText and Keras in Python.

Speaker: Susan Li

Sr Data Scientist at Kognitiv Corporation

I am Susan Li, the Sr. Data Scientist at Kognitiv where I specialize in machine learning and NLP. I’m passionate about helping organizations realize the potential of big data and advanced analytics, and helping individuals enhance skills in data literacy. I frequently write and speak about predictive analytics, machine learning and NLP for technical and general audience. In my free time, I can be found training for the next half marathon.

Find Susan Li at