Log messages are typically unstructured, a combination of constant free-text written by developers and variable values. A lot of information is buried in there. A log parser can split the log message into its elements and identify templates for easier analysis: it can reduce the dimensions from tenths of millions of logs to a few hundreds patterns. Log templates recognition, or log parsing, is a widely researched topic in the industry as well as in academia.
Log parsing is always the very first step of any log analytics work and it is crucial for the correct extraction of useful information. The most common applications are:
and, most importantly
Traditionally log templates and key parameters are extracted through handcrafted regular expressions, but this approach is very time-consuming and error-prone. There are several algorithms that can automate this task
1
such as SLCT, IPLoM, LKE, LogSig, Spell, Drain. They can be divided into two categories: batch processing and online log parsing. The main difference is that batch processing methods need the entirety of the dataset available and therefore can only work “offline”, on historical data, while online parsers process logs sequentially one by one, which is more practical for real-time services.
When choosing a log parser there are several things to consider:
For more details about the state of the art parsers I suggest you to read
this
nice scientific article from J. Zhu et al.
Share :
Let's discover how clustering techniques can be useful for log management.
With Big Data come new challenges that require fresh automated approaches to tackle them in real-time.
ML can improve the IT workflow in several different areas: let's find the right model for the right task.