Logstash 有哪些常用的过滤器,如何使用 Grok 和 Mutate 过滤器?
Logstash 提供了多种过滤器插件,用于对数据进行解析、转换和丰富。以下是常用的过滤器及其使用方法。1. Grok 过滤器Grok 是最强大的过滤器,用于将非结构化数据解析为结构化数据。基本用法filter { grok { match => { "message" => "%{COMBINEDAPACHELOG}" } }}多模式匹配filter { grok { match => { "message" => [ "%{COMBINEDAPACHELOG}", "%{COMMONAPACHELOG}", "%{NGINXACCESS}" ] } }}自定义模式filter { grok { patterns_dir => ["/path/to/patterns"] match => { "message" => "%{CUSTOM_PATTERN:custom_field}" } }}2. Mutate 过滤器Mutate 过滤器用于对字段进行各种操作。重命名字段filter { mutate { rename => { "old_name" => "new_name" } }}转换字段类型filter { mutate { convert => { "status" => "integer" "price" => "float" "enabled" => "boolean" } }}删除字段filter { mutate { remove_field => ["temp_field", "debug_info"] }}替换字段值filter { mutate { replace => { "message" => "new message" } }}添加字段filter { mutate { add_field => { "environment" => "production" "processed_at" => "%{@timestamp}" } }}合并字段filter { mutate { merge => { "field1" => "field2" } }}3. Date 过滤器Date 过滤器用于解析时间戳并转换为 Logstash 的 @timestamp 字段。基本用法filter { date { match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"] }}多种日期格式filter { date { match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z", "yyyy-MM-dd HH:mm:ss", "ISO8601" ] }}自定义目标字段filter { date { match => ["log_time", "yyyy-MM-dd HH:mm:ss"] target => "parsed_time" }}时区设置filter { date { match => ["timestamp", "yyyy-MM-dd HH:mm:ss"] timezone => "Asia/Shanghai" }}4. GeoIP 过滤器GeoIP 过滤器根据 IP 地址添加地理位置信息。基本用法filter { geoip { source => "client_ip" }}指定目标字段filter { geoip { source => "client_ip" target => "geoip" }}指定数据库路径filter { geoip { source => "client_ip" database => "/path/to/GeoLite2-City.mmdb" }}指定字段filter { geoip { source => "client_ip" fields => ["city_name", "country_name", "location"] }}5. Useragent 过滤器Useragent 过滤器解析 User-Agent 字符串。基本用法filter { useragent { source => "agent" }}指定目标字段filter { useragent { source => "agent" target => "ua" }}6. CSV 过滤器CSV 过滤器解析 CSV 格式的数据。基本用法filter { csv { separator => "," columns => ["name", "age", "city"] }}自动检测列名filter { csv { separator => "," autodetect_column_types => true }}7. JSON 过滤器JSON 过滤器解析 JSON 字符串。基本用法filter { json { source => "message" }}指定目标字段filter { json { source => "message" target => "parsed_json" }}保留原始字段filter { json { source => "message" remove_field => ["message"] }}8. Ruby 过滤器Ruby 过滤器允许使用 Ruby 代码进行复杂的数据处理。基本用法filter { ruby { code => 'event.set("computed_field", event.get("field1") + event.get("field2"))' }}复杂逻辑filter { ruby { code => ' if event.get("status").to_i >= 400 event.tag("error") else event.tag("success") end ' }}数组操作filter { ruby { code => ' items = event.get("items") if items.is_a?(Array) event.set("item_count", items.length) event.set("total_price", items.sum { |i| i["price"] }) end ' }}9. Drop 过滤器Drop 过滤器用于丢弃事件。条件丢弃filter { if [log_level] == "DEBUG" { drop { } }}百分比丢弃filter { ruby { code => 'event.cancel if rand < 0.1' }}10. Aggregate 过滤器Aggregate 过滤器用于聚合多个事件。基本用法filter { aggregate { task_id => "%{user_id}" code => ' map["count"] ||= 0 map["count"] += 1 ' push_map_as_event => true timeout => 60 }}过滤器组合多个过滤器可以组合使用:filter { # 解析日志格式 grok { match => { "message" => "%{COMBINEDAPACHELOG}" } } # 转换字段类型 mutate { convert => { "response" => "integer" } } # 解析时间戳 date { match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"] } # 添加地理位置信息 geoip { source => "clientip" } # 解析 User-Agent useragent { source => "agent" }}最佳实践过滤器顺序:按逻辑顺序排列过滤器条件判断:使用条件语句避免不必要的处理性能优化:避免使用复杂的 Ruby 代码错误处理:处理解析失败的情况测试验证:使用 Grok Debugger 等工具测试过滤器