elklogstashgrok入门指南

wx5bcd2f496a1cf wx5bcd2f496a1cf     2022-12-03     453

关键词:


ELK

A Beginner’s Guide to Logstash Grok(​​https://logz.io/blog/logstash-grok/​​)


The ability to efficiently analyze and query the data being shipped into the ​​ELK Stack​​ depends on the information being readable. This means that as unstructured data is being ingested into the system, it must be translated into structured message lines.

有效分析和查询送入​​ELK堆栈​​的数据的能力取决于信息的可读性。这意味着,当将非结构化数据摄取到系统中时,必须将其转换为结构化消息行。

This ungrateful but critical task is usually left to Logstash (though there are other log shippers available, see our comparison of ​​Fluentd vs. Logstash​​ as one example). Regardless of the data source that you define, pulling the logs and performing some magic to beautify them is necessary to ensure that they are parsed correctly before being outputted to Elasticsearch.

通常,这个忘恩负义但至关重要的任务留给Logstash(尽管还有其他日志​​传送器​​​可用,请参阅我们对​​Fluentd与Logstash的​​比较作为一个示例)。无论您定义什么数据源,都必须提取日志并执行一些魔术来美化它们,以确保在将它们输出到Elasticsearch之前正确地对其进行了解析。

Data manipulation in ​​Logstash​​ is performed using filter plugins. This article focuses on one of the most popular and useful filter plugins – the Logstash grok filter, which is used to parse unstructured data into structured data.

​Logstash中的​​数据操作是使用过滤器插件执行的。本文重点介绍最流行和有用的过滤器插件之一– Logstash grok过滤器,该过滤器用于将非结构化数据解析为结构化数据。

 

What is grok?


The original term is actually pretty new — coined by Robert A. Heinlein in his 1961 book Stranger in a Strange Land — it refers to understanding something to the level one has actually immersed oneself in it. It’s an appropriate name for the grok language and Logstash grok plugin, which modify information in one format and immerse it in another (JSON, specifically). There are already a couple hundred Grok patterns for logs available.

最初的术语实际上是很新的-由罗伯特·A·海因莱因(Robert A. Heinlein)在他的1961年的《陌生的土地上的陌生人》一书中创造的–指的是理解某种东西,使人们真正沉浸于其中。这是grok语言和Logstash grok插件的合适名称,它们可以以一种格式修改信息并将其浸入另一种格式(特别是JSON)。已经有数百种用于记录的Grok模式。

Put simply, grok is a way to match a line against a regular expression, map specific parts of the line into dedicated fields, and perform actions based on this mapping.

 

How does it work?


Put simply, grok is a way to match a line against a regular expression, map specific parts of the line into dedicated fields, and perform actions based on this mapping.

简而言之,grok是一种将行与正则表达式匹配,将行的特定部分映射到专用字段中以及根据此映射执行操作的方法。

Built-in, there are over ​​200 Logstash patterns​​ for filtering items such as words, numbers, and dates in AWS, Bacula, Bro, Linux-Syslog and more. If you cannot find the pattern you need, you can write your own custom pattern. There are also options for multiple match patterns, which simplifies the writing of expressions to capture log data.

内置了超过​​200种Logstash模式,​​用于过滤AWS,Bacula,Bro,Linux-Syslog等中的单词,数字和日期等项目。如果找不到所需的模式,则可以编写自己的自定义模式。还有多个匹配模式的选项,可简化表达式的编写以捕获日志数据。

Here is the basic syntax format for a Logstash grok filter:

%SYNTAX:SEMANTIC

The SYNTAX will designate the pattern in the text of each log. The SEMANTIC will be the identifying mark that you actually give that syntax in your parsed logs. In other words:

SYNTAX将在每个日志的文本中指定模式。SEMANTIC将是您在解析的日志中实际赋予该语法的识别标记。换一种说法:

%PATTERN:FieldName

This will match the predefined pattern and map it to a specific identifying field.

这将匹配预定义的模式并将其映射到特定的标识字段。

For example, a pattern like 127.0.0.1 will match the Grok IP pattern, usually an IPv4 pattern.

例如,类似于127.0.0.1的模式将匹配Grok IP模式,通常是IPv4模式。

Grok has separate IPv4 and IPv6 patterns, but they can be filtered together with the syntax IP.

Grok具有单独的IPv4和IPv6模式,但是可以将它们与语法IP一起过滤。

This standard pattern is as follows:

IPV4 (?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]1,2)[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]1,2)[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]1,2)[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]1,2))(?![0-9])

假装没有统一的IP语法,您只需使用相同的语义字段名称来查找它们: 

Pretending there was no unifying IP syntax, you would simply grok both with the same semantic field name:

%IPv4:Client IP %IPv6:Client IP

Again, just use the IP syntax, unless for any reason you want to separate these respective addresses into separate fields.

同样,只需使用IP语法,除非出于任何原因要将这些各自的地址分隔到单独的字段中。

Since grok is essentially based upon a combination of regular expressions, you can also create your own custom regex-based grok filter with this pattern:

由于grok本质上是基于正则表达式的组合,因此您还可以使用以下模式创建自己的基于正则表达式的自定义grok过滤器:

(?<custom_field>custom pattern)

For example:

(?\\d\\d-\\d\\d-\\d\\d)

This grok pattern will match the regex of 22-22-22 (or any other digit) to the field name.

此grok模式将22-22-22(或任何其他数字)的正则表达式与字段名称匹配。

 

Logstash Grok Pattern Examples


为了演示,我将使用以下应用程序日志:

To demonstrate how to get started with grokking, I’m going to use the following application log:

2016-07-11T23:56:42.000+00:00 INFO [MySecretApp.com.Transaction.Manager]:Starting transaction for session -464410bf-37bf-475a-afc0-498e0199f008

The goal I want to accomplish with a grok filter is to break down the logline into the following fields: timestamp, log level, class, and then the rest of the message.

The following grok pattern will do the job:

grok    match =>  "message" => "%TIMESTAMP_ISO8601:timestamp %LOGLEVEL:log-level \\[%DATA:class\\]:%GREEDYDATA:message"  

ELK

#NOTE:​GREEDYDATA​​ is the way Logstash Grok expresses the regex ​​.*​​ 

Grok Data Type Conversion

By default, all ​​SEMANTIC​​​ entries are strings, but you can flip the data type with an easy formula. The following Logstash grok example converts any syntax ​​NUMBER​​​ identified as a semantic ​​num​​​ into a semantic float, ​​float​​:

默认情况下,所有​​SEMANTIC​​​条目都是字符串,但是您可以使用简单的公式来翻转数据类型。以下Logstash grok示例将任何​​NUMBER​​​标识为语义的语法​​num​​​转换为语义浮点数​​float​​: 

%NUMBER:num:float

It’s a pretty useful tool, even though it is currently only available for conversions to ​​float​​​ or integers ​​int​​.

 这是一个非常有用的工具,即使它目前仅可用于​​float​​​或转换​​int​​。

_grokparsefailure

This will try to match the incoming log to the given grok pattern. In case of a match, the log will be broken down into the specified fields, according to the defined grok patterns in the filter. In case of a mismatch, Logstash will add a tag called ​​_grokparsefailure​​.

这将尝试将传入的日志与给定的grok模式匹配。如果匹配,则将根据过滤器中定义的grok模式将日志细分为指定的字段。如果不匹配,Logstash将添加一个名为的标签​​_grokparsefailure​​。

However, in our case, the filter will match and result in the following output:

     
"message" => "Starting transaction for session -464410bf-37bf-475a-afc0-498e0199f008",
"timestamp" => "2016-07-11T23:56:42.000+00:00",
"log-level" => "INFO",
"class" => "MySecretApp.com.Transaction.Manager"

 

The grok debugger


A great way to get started with building your grok filters is this grok debug tool: ​​https://grokdebug.herokuapp.com/​

This tool allows you to paste your log message and gradually build the grok pattern while continuously testing the compilation. As a rule, I recommend starting with the ​​%GREEDYDATA:message​​ pattern and slowly adding more and more patterns as you proceed.

In the case of the example above, I would start with:

%GREEDYDATA:message

Then, to verify that the first part is working, proceed with:

%TIMESTAMP_ISO8601:timestamp %GREEDYDATA:message

 

Common Logstash grok examples


Here are some examples that will help you to familiarize yourself with how to construct a grok filter:

Syslog

Parsing syslog messages with Grok is one of the more common demands of new users,. There are also several different kinds of log formats for syslog so keep writing your own custom grok patterns in mind. Here is one example of a common syslog parse:

grok 
match => "message" => "%SYSLOGTIMESTAMP:syslog_timestamp
%SYSLOGHOST:syslog_hostname
%DATA:syslog_program(?:\\[%POSINT:syslog_pid\\])?:
%GREEDYDATA:syslog_message"

If you are using ​​rsyslog​​, you can configure the latter to send logs to Logstash.

Apache Access logs

grok  
match => "message" => "%COMBINEDAPACHELOG"

Elasticsearch

grok 
match => ["message", "\\[%TIMESTAMP_ISO8601:timestamp\\]\\[%DATA:loglevel%SPACE\\]\\[%DATA:source%SPACE\\]%SPACE\\[%DATA:node\\]%SPACE\\[%DATA:index\\] %NOTSPACE \\[%DATA:updated-type\\]",
"message", "\\[%TIMESTAMP_ISO8601:timestamp\\]\\[%DATA:loglevel%SPACE\\]\\[%DATA:source%SPACE\\]%SPACE\\[%DATA:node\\] (\\[%NOTSPACE:Index\\]\\[%NUMBER:shards\\])?%GREEDYDATA"
]

ELK

ELK

 Redis

grok 
match => ["redistimestamp", "\\[%MONTHDAY %MONTH %TIME]",
["redislog", "\\[%POSINT:pid\\] %REDISTIMESTAMP:timestamp"],
["redismonlog", "\\[%NUMBER:timestamp \\[%INT:database %IP:client:%NUMBER:port\\] "%WORD:command"\\s?%GREEDYDATA:params"]
]

MongoDB

MONGO_LOG %SYSLOGTIMESTAMP:timestamp \\[%WORD:component\\] %GREEDYDATA:messageMONGO_QUERY \\ (?<= ).*(?=  ntoreturn:) \\MONGO_SLOWQUERY %WORD %MONGO_WORDDASH:database\\.%MONGO_WORDDASH:collection %WORD: %MONGO_QUERY:query %WORD:%NONNEGINT:ntoreturn %WORD:%NONNEGINT:ntoskip %WORD:%NONNEGINT:nscanned.*nreturned:%NONNEGINT:nreturned..+ (?<duration>[0-9]+)msMONGO_WORDDASH \\b[\\w-]+\\bMONGO3_SEVERITY \\wMONGO3_COMPONENT %WORD|-MONGO3_LOG %TIMESTAMP_ISO8601:timestamp %MONGO3_SEVERITY:severity %MONGO3_COMPONENT:component%SPACE(?:\\[%DATA:context\\])? %GREEDYDATA:message

AWS

ELB_ACCESS_LOG %TIMESTAMP_ISO8601:timestamp %NOTSPACE:elb %IP:clientip:%INT:clientport:int (?:(%IP:backendip:?:%INT:backendport:int)|-) %NUMBER:request_processing_time:float %NUMBER:backend_processing_time:float %NUMBER:response_processing_time:float %INT:response:int %INT:backend_response:int %INT:received_bytes:int %INT:bytes:int "%ELB_REQUEST_LINE"
CLOUDFRONT_ACCESS_LOG (?<timestamp>%YEAR-%MONTHNUM-%MONTHDAY\\t%TIME)\\t%WORD:x_edge_location\\t(?:%NUMBER:sc_bytes:int|-)\\t%IPORHOST:clientip\\t%WORD:cs_method\\t%HOSTNAME:cs_host\\t%NOTSPACE:cs_uri_stem\\t%NUMBER:sc_status:int\\t%GREEDYDATA:referrer\\t%GREEDYDATA:agent\\t%GREEDYDATA:cs_uri_query\\t%GREEDYDATA:cookies\\t%WORD:x_edge_result_type\\t%NOTSPACE:x_edge_request_id\\t%HOSTNAME:x_host_header\\t%URIPROTO:cs_protocol\\t%INT:cs_bytes:int\\t%GREEDYDATA:time_taken:float\\t%GREEDYDATA:x_forwarded_for\\t%GREEDYDATA:ssl_protocol\\t%GREEDYDATA:ssl_cipher\\t%GREEDYDATA:x_edge_response_result_type

 

Summing it up


Logstash grok is just one type of filter that can be applied to your logs before they are forwarded into Elasticsearch. Because it plays such a crucial part in the logging pipeline, grok is also one of the most commonly-used filters.

Logstash grok只是在将日志转发到Elasticsearch之前可以应用于您的日志的一种过滤器。由于grok在测井管道中起着至关重要的作用,因此它也是最常用的过滤器之一。 

Here is a list of some useful resources that can help you along the grokking way:

Happy grokking!

 

cassandra入门指南--安装及配置

Cassandra入门指南--安装及配置cassandra安装配置  查看详情

go语言入门指南零基础入门go语言|golang入门指南(代码片段)

文章目录写在前面全部练习项目都在github这个仓库中`https://github.com/CocaineCong/Golang-Learning`1.【第一轮】基础部分1.1教程1.2练习2.【第二轮】网络爬虫2.1教程2.2mod管理第三方包2.3git机制3.【第三轮】备忘录4.【第四轮】商城or视... 查看详情

go语言入门指南零基础入门go语言|golang入门指南(代码片段)

文章目录写在前面全部练习项目都在github这个仓库中`https://github.com/CocaineCong/Golang-Learning`1.【第一轮】基础部分1.1教程1.2练习2.【第二轮】网络爬虫2.1教程2.2mod管理第三方包2.3git机制3.【第三轮】备忘录4.【第四轮】商城or视... 查看详情

opencv入门指南(转载)

...18网上的总结的一些用openncv的库来做的事:下面列出OpenCV入门指南系列目录,以方便大家查看:文章链接:http://blog.csdn.net/morewindows/article/details/8426318下面这些链接在文章末尾:1.《【OpenCV入门指南】第 查看详情

c#入门指南

https://docs.microsoft.com/zh-cn/dotnet/csharp/tour-of-csharp/ 查看详情

001.gettingstarted--入门指南

GettingStarted入门指南662of756peoplefoundthishelpfulMeng.Net自译1.Install.NETCore 到官网安装.NETCore2.Createanew.NETCoreproject: 用cmd命令窗口,创建一个新的工程  mkdiraspnetcoreapp  新建目录aspnetcoreapp  cdaspnetcoreapp  移动到目录&n 查看详情

flutter之beamer路由入门指南

beamer路由入门指南前言使用方法1、路由配置方式1路由配置方式2路由跳转测试现象前言Beamer是一个很好用的路由组件,本文以beamer1.5.0版本进行说明,前面博主也介绍了其他路由组件Flutter实战之go_router路由组件入门指南、Flutter... 查看详情

flutter之beamer路由入门指南

beamer路由入门指南前言使用方法1、路由配置方式1路由配置方式2路由跳转测试现象前言Beamer是一个很好用的路由组件,本文以beamer1.5.0版本进行说明,前面博主也介绍了其他路由组件Flutter实战之go_router路由组件入门指南、Flutter... 查看详情

ctf入门指南

ctf入门指南 如何入门?如何组队?capturetheflag夺旗比赛类型:Web密码学pwn程序的逻辑分析,漏洞利用windows、linux、小型机等misc杂项,隐写,数据还原,脑洞、社会工程、与信息安全相关的大数据reverse逆向windows、linux类ppc编... 查看详情

ctf入门指南

转自http://www.cnblogs.com/christychang/p/6032532.htmlctf入门指南如何入门?如何组队?capturetheflag夺旗比赛类型:Web密码学pwn程序的逻辑分析,漏洞利用windows、linux、小型机等misc杂项,隐写,数据还原,脑洞、社会工程、与信息安全相关... 查看详情

ctf入门指南

ctf入门指南如何入门?如何组队?capturetheflag夺旗比赛类型:Web密码学pwn程序的逻辑分析,漏洞利用windows、linux、小型机等misc杂项,隐写,数据还原,脑洞、社会工程、与信息安全相关的大数据reverse逆向windows、linux类ppc编程类... 查看详情

ctf入门指南

ctf入门指南如何入门?如何组队?capturetheflag夺旗比赛类型:Web密码学pwn程序的逻辑分析,漏洞利用windows、linux、小型机等misc杂项,隐写,数据还原,脑洞、社会工程、与信息安全相关的大数据reverse逆向windows、linux类ppc编程类... 查看详情

opencv入门指南

...,文章最后还介绍了一个使用OpenCV的简单小例子。《OpenCV入门指南》系列文章地址:http://blog.csdn.net/morewindows/article/category/1291764一.OpenCV的下载可以到http://www.opencv.org.cn 查看详情

docker入门指南

地址:docker入门指南更换国内镜像源dockerstore地址默认是在国外,下载速度很慢,自行更换国内docker加速地址。名词解释image:镜像containers:容器docker-machine:docker虚拟主机dockerstore:docker存储云术语Image和ContainerImage可以理解为一个系... 查看详情

[转]visualvm入门指南

...可以方便、快捷地查看多个Java应用程序的相关信息。本入门指南应当对您快速设置并运行VisualVM有所帮助。本指南将演示如何安装VisualVM,以及如何通过安装VisualVM更新中心提供的插件向该工具添加功能。本指南还将介绍如何启... 查看详情

opencv入门指南第二篇缩放图像(代码片段)

【OpenCV入门指南】第二篇缩放图像上一篇《【OpenCV入门指南】第一篇安装OpenCV》讲解了如何在VS2008下安装和配置OpenCV,本篇将介绍使用OpenCV来缩放图片。首先介绍几个关键函数——cvResize和cvCreateImage《OpenCV入门指南》系列文章地... 查看详情

好书推荐typescript的入门指南|《typescript入门与实战》

目录一、TypeScript的入门指南二、书籍信息三、适合人群四、相关推荐一、TypeScript的入门指南你好,我是小雨青年,一名程序员。今天为你推荐的书籍是《TypeScript入门与实战》。本书包括两部分:JavaScript语言编程ÿ... 查看详情

gulp使用1-入门指南

入门指南1.全局安装gulp:$npminstall--globalgulp或使用cnpm2.作为项目的开发依赖(devDependencies)安装:$npminstall--save-devgulp3.在项目根目录下创建一个名为 gulpfile.js 的文件:vargulp=require(‘gulp‘);gulp.task(‘default‘,function(){// 查看详情