乐闻世界logo
搜索文章和话题

What is the purpose of the Grok filter in Logstash, and how do you use Grok to parse logs?

2月21日 16:02

Grok is one of the most powerful and commonly used filters in Logstash, used to parse unstructured text data into structured data formats.

Grok Basic Concepts

Grok is based on regular expressions and parses text into fields through predefined patterns. Grok syntax format:

shell
%{PATTERN:field_name}

Where:

  • PATTERN: Predefined pattern name
  • field_name: Field name to store after parsing

Common Grok Patterns

Basic Patterns

  • %{NUMBER:num}: Match numbers
  • %{WORD:word}: Match words
  • %{DATA:data}: Match any data
  • %{GREEDYDATA:msg}: Greedy match remaining data
  • %{IP:ip}: Match IP addresses
  • %{DATE:date}: Match dates

Log Patterns

  • %{COMBINEDAPACHELOG}: Apache combined log format
  • %{COMMONAPACHELOG}: Apache common log format
  • %{NGINXACCESS}: Nginx access log format
  • %{SYSLOGBASE}: System log base format

Practical Application Examples

1. Apache Access Log Parsing

conf
filter { grok { match => { "message" => "%{COMBINEDAPACHELOG}" } } }

After parsing, the following fields are generated:

  • clientip
  • ident
  • auth
  • timestamp
  • verb
  • request
  • httpversion
  • response
  • bytes
  • referrer
  • agent

2. Custom Log Format

Assuming log format:

shell
2024-02-21 10:30:45 [INFO] User john.doe logged in from 192.168.1.100

Configuration:

conf
filter { grok { match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} \[%{LOGLEVEL:level}\] %{GREEDYDATA:message}" } } }

3. Complex Log Parsing

conf
filter { grok { match => { "message" => "%{IP:client_ip} - %{USER:user} \[%{HTTPDATE:timestamp}\] \"%{WORD:method} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}\" %{NUMBER:response_code} %{NUMBER:bytes} \"%{DATA:referrer}\" \"%{DATA:agent}\"" } } }

Custom Grok Patterns

Define custom patterns in configuration files:

conf
filter { grok { patterns_dir => ["/path/to/patterns"] match => { "message" => "%{CUSTOM_PATTERN:custom_field}" } } }

Define in patterns file:

shell
CUSTOM_PATTERN [0-9]{3}-[A-Z]{2}

Multi-pattern Matching

Grok supports multiple matching patterns, tried in order:

conf
filter { grok { match => { "message" => [ "%{COMBINEDAPACHELOG}", "%{COMMONAPACHELOG}", "%{NGINXACCESS}" ] } } }

Grok Debugging Tools

1. Grok Debugger

Use online Grok Debugger tools to test and debug patterns:

  • Grok Debugger in Kibana Dev Tools
  • Elastic official online debugger

2. Add Tags for Debugging

conf
filter { grok { match => { "message" => "%{PATTERN:field}" } add_tag => ["_grokparsefailure"] tag_on_failure => ["_grokparsefailure"] } }

Performance Optimization

  1. Use Precompiled Patterns: Logstash caches compiled patterns
  2. Avoid Greedy Matching: Use more precise patterns for better performance
  3. Reduce Pattern Count: Use only necessary patterns
  4. Use Conditional Statements: Apply specific grok patterns to specific data types

Best Practices

  1. Start Simple: Test simple patterns first, gradually increase complexity
  2. Use Named Capture Groups: Improve code readability
  3. Handle Parse Failures: Use _grokparsefailure tags to handle parsing failures
  4. Document Custom Patterns: Add comments explaining custom patterns
  5. Version Control: Include custom pattern files in version control
标签:Logstash