Fast Log Analysis by Automatically Parsing Heterogeneous Log | Software Development Conference QCon New York

What You’ll Learn

Hear how parsing logs is extremely challenging. However, there are approaches that originate in machine learning that can be used to make sense of automating the parsing of heterogeneous logs.
Learn interesting approaches to log parsing and backed by a reference implementation used in a commercial product.
Understand the challenges to parsing logs automatically.

Abstract

Most log analysis tools provide platforms for indexing, monitoring, and visualizing logs. Although these tools allow users to relatively easily perform ad-hoc queries and define rules in order to generate alerts, they do not provide automated log parsing support. In particular, most of these systems use regular expressions (RegEx) to parse log messages. These tools assume that the users know how to work with RegEx. and make them manually parse or define the fields of interest. By definition, these tools support only supervised parsing as human input is essential. However, human involvement is clearly non-scalable for heterogeneous and continuously evolving log message formats in systems such as IoTs and custom applications -- it is impossible to manually review the sheer number of logs generated in an hour, let alone days and weeks. On top of that, writing RegEx-based parsing rules is a long, frustrating, and error-prone process as RegEx rules may conflict with each other. In this talk, we present a solution inspired by the unsupervised machine learning techniques for automatically generating RegEx rules from a set of logs with no (or minimal) human involvement. Human involvement is limited to providing a set of training logs. In addition, we present a demo illustrating how to integrate our solution with the popular Elasticsearch-Logstash-Kibana (ELK) stack to analyze logs collected from the real-world applications.

Question:

Who is the main audience the talk is targeting?

Answer:

The talk is mainly targeting people who design/architect log analytics solutions and are focused on making the troubleshooting operational problems faster by analyzing logs. When a computer operates, it generates logs to communicate with humans -- logs act as tweets to inform system status. If something fails, somebody has to understand the logs and take necessary steps to correct it. This talk is about how people parse those logs in a form that is one level up in analytics.

Question:

What's the motivation for the talk?

Answer:

When we initially started building the log analysis product for commercial purposes, we experienced bottleneck situations pretty quick. You have a log, but, unless you parse it, you cannot build any useful tools/analytics with it -- this is kind of limited. Since every log is really different (I mean there is no consistent form of logging), it's become very hard to automate.

To solve this problem, we say: “Ok, if this is automated, it doesn't need to be 100% perfect to start log parsing with no (or minimal) human input about the logs, but at least it will help people to get it started with the log analysis. Over the time, if more input is provided, then the automated process will act like a human expert. ” So you throw any logs and the system comes up with some regular expression based patterns. Logs are usually unstructured and there is a lot of text in a log, but, once you run our method, it will generate patterns to parse logs into structured forms, and use that to make sense of the logs.

In our talk, we will discuss our approaches to solving this problem. For example, in the talk, we will cover one particular log which is very scary (almost one page long). Using our tool and the approach we took to solve the problem, the tool will show is that given any log you have a way to parse it.

Question:

How does it do that? Does it apply machine learning techniques to be able to identify the components of the log? What does it actually do?

Answer:

Yes, it's a good question. What we found is if you apply pure machine learning the issue is run time. Machine learning is a time-consuming process, and it is still limiting. What we did is bring machine learning concepts into a blended approach with ways of understanding system logs. What I mean, although the logs are all in different formats, they are generated by a computer. A computer is dumb -- it’s just some programs writing information in the form of logs. Taking that assumption, it is not a storybook where all lines are different. Since logs are generated by some programs, it is only a few logging points which usually generate logs with fixed formats (maybe 10 to 100 formats or something like that). If you go with this kind of deep system knowledge, then it is just solving the problem systematically.

Question:

You have a tool that automates discovery of parsing logs now. Is this talk about a specific tool or about techniques you used to build that tool?

Answer:

The talk will be very generic. Because we have built this tool, we developed a methodology for addressing the problem. We’ll mostly focus on the methodology and use the tool to demonstrate the reference implementation and various design trade-offs. BTW, we completely assume that you don't have any prior knowledge to be able to attend this talk.

Speaker: Biplob Debnath

Researcher @NEC

Dr. Biplob Debnath is a researcher at NEC Labs, where his works over the last seven years have spanned building end-to-end face-recognition based video analytics system, log analytics system, non-volatile memory (i.e., flash, PCM) based caching system, and data deduplication system. His technical works have received 1300+ Google Scholar citations. Currently, his works focus on applying machine learning and AI techniques for solving real-world problems. His works on video and log analytics ships in NEC's commercial products. His Ph.D. research on flash-based key-value stores ships in Bing ObjectStore, research on data deduplication ships in Windows Server 2012, and research on caching ships in IBM's Storage Array. Biplob received a Ph.D. and an M.S. from the University of Minnesota.

Find Biplob Debnath at

Speaker page

@bkdebnath

bkdebnath

Speaker: Willard Dennis

Senior Systems Administrator @NEC

Will Dennis is currently employed as a Senior Systems Administrator at NEC Laboratories America, and has over 25 years of experience in managing, installing, and troubleshooting enterprise computing systems, networks, and software. A lifelong learner, Will enjoys keeping current with both tech and culture in the field of Information Technology. Will can be found online on Twitter as @willarddennis, and thru LinkedIn at https://www.linkedin.com/in/willdennis/

Find Willard Dennis at

Speaker page

@willarddennis

Similar Talks

Programming for Hostile Environments

SVP, Engineering @packethost

Nathan Goulding

Platforms at Twilio: Unlocking Developer Effectiveness

Senior Director Platform Engineering @twilio

Justin Kitagawa

Help! I Accidentally Distributed My System!

Software Engineer & Engineering Manager @Honeycombio

Emily Nakashima

Help! I Accidentally Distributed My System!

Developer Programs Engineer @Google

Rachel Myers

Heretical Resilience: To Repair is Human

Staff Infrastructure Engineer @travisci

Ryn Daniels

ML Data Pipelines for Real-Time Fraud Prevention @PayPal

Lead Data Architect, Risk and Compliance Management Platform @PayPal

Mikhail Kourjanski

How Machines Help Humans Root Cause Issues @Netflix

Senior Software Engineer, Operational Insights @Netflix

Seth Katz

Scaling Push Messaging for Millions of Devices @Netflix

Software Engineer @Netflix

Susheel Aroskar

Rethinking HCI With Neural Interfaces @CTRLlabsco

Director of R&D @CTRLlabsCo

Adam Berenzweig

Tracks

Microservices: Patterns & Practices

Evolving, observing, persisting, and building modern microservices
Developer Experience: Level up Your Engineering Effectiveness

Improving the end to end developer experience - design, dev, test, deploy, operate/understand. Tools, techniques, and trends.
Modern Java Reloaded

Modern, Modular, fast, and effective Java. Pushing the boundaries of JDK 9 and beyond.
Modern User Interfaces: Screens and Beyond

Zero UI, voice, mobile: Interfaces pushing the boundary of what we consider to be the interface
Practical Machine Learning

Applied machine learning lessons for SWEs, including tech around TensorFlow, TPUs, Keras, Caffe, & more

Ethics in Computing

Inclusive technology, Ethics and politics of technology. Considering bias. Societal relationship with tech. Also the privacy problems we have today (e.g., GDPR, right to be forgotten)
Architectures You've Always Wondered About

Next-gen architectures from the most admired companies in software, such as Netflix, Google, Facebook, Twitter, Goldman Sachs
Modern CS in the Real World

Thoughts pushing software forward, including consensus, CRDT's, formal methods, & probalistic programming
Container and Orchestration Platforms in Action

Runtime containers, libraries, and services that power microservices
Finding the Serverless Sweetspot

Stories about the pains and gains from migrating to Serverless.

Chaos, Complexity, and Resilience

Lessons building resilient systems and the war stories that drove their adoption
Real World Security

Practical lessons building, maintaining, and deploying secure systems
Blockchain Enabled

Exploring Smart contracts, oracles, sidechains, and what can/cannot be done with blockchain today.
21st Century Languages

Lessons learned from languages like Rust, Go-lang, Swift, Kotlin, and more.
Empowered Teams

Safely running inclusive teams that are autonomous and self-correcting

Schedule

Track: Modern CS in the Real World

Location: Soho Complex, 7th fl.

Duration: 10:35am - 11:25am

Day of week: Thursday

Level: Advanced

Persona: Developer

What You’ll Learn

Abstract

Find Biplob Debnath at

Find Willard Dennis at

Similar Talks

Tracks

Microservices: Patterns & Practices

Developer Experience: Level up Your Engineering Effectiveness

Modern Java Reloaded

Modern User Interfaces: Screens and Beyond

Practical Machine Learning

Ethics in Computing

Architectures You've Always Wondered About

Modern CS in the Real World

Container and Orchestration Platforms in Action

Finding the Serverless Sweetspot

Chaos, Complexity, and Resilience

Real World Security

Blockchain Enabled

21st Century Languages

Empowered Teams

Presentation: Fast Log Analysis by Automatically Parsing Heterogeneous Log

Track: Modern CS in the Real World

Location: Soho Complex, 7th fl.

Duration: 10:35am - 11:25am

Day of week: Thursday

Level: Advanced

Persona: Developer

More talks on:

Share this on:

What You’ll Learn

Abstract

Find Biplob Debnath at

Find Willard Dennis at

Similar Talks

Tracks