Logstash performance optimization is an important topic, especially when processing large volumes of log data. Here are several key optimization strategies.
1. JVM Memory Configuration
Heap Memory Settings
Logstash runs on JVM, and reasonable heap memory configuration is crucial:
bash# Set in config/jvm.options -Xms2g -Xmx2g
Best Practices:
- Heap memory should not exceed 50% of system physical memory
- Set Xms and Xmx to the same value to avoid performance overhead from dynamic adjustment
- For large data volume scenarios, recommended heap memory is 4-8GB
JVM Parameter Optimization
bash# Use G1 garbage collector -XX:+UseG1GC # Set GC thread count -XX:ConcGCThreads=2 -XX:ParallelGCThreads=4 # Set young generation ratio -XX:NewRatio=1
2. Pipeline Configuration Optimization
Pipeline Workers
confpipeline.workers: 4
- Default value is the number of CPU cores
- Increasing workers can improve parallel processing capability
- Recommended to set to 1-2 times the number of CPU cores
Batch Size
confpipeline.batch.size: 125
- Batch size processed by each worker at a time
- Default value is 125, adjust according to actual situation
- Increasing batch size can improve throughput but increases latency
Batch Delay
confpipeline.batch.delay: 50
- Delay time for batch processing (milliseconds)
- Default value is 50ms
- Reducing delay improves real-time performance but may reduce throughput
3. Filter Optimization
Reduce Unnecessary Filters
conffilter { # Apply filters only to specific data types if [type] == "apache" { grok { match => { "message" => "%{COMBINEDAPACHELOG}" } } } }
Use Conditional Statements
conffilter { # Avoid reprocessing already parsed data if [parsed] != "true" { grok { match => { "message" => "%{PATTERN:field}" } add_field => { "parsed" => "true" } } } }
Optimize Grok Patterns
- Use more precise patterns, avoid greedy matching
- Place commonly used patterns first
- When using multi-pattern matching, place the most likely matching pattern first
4. Input/Output Optimization
File Input Optimization
confinput { file { path => "/var/log/*.log" # Start reading from the end of the file start_position => "end" # Disable sincedb file (only for testing) sincedb_path => "/dev/null" # Increase read buffer size file_completed_action => "delete" } }
Elasticsearch Output Optimization
confoutput { elasticsearch { hosts => ["http://localhost:9200"] # Batch commit size flush_size => 500 # Batch commit timeout idle_flush_time => 1 # Enable compression http_compression => true # Increase connection pool size pool_max => 10 } }
5. Monitoring and Debugging
Enable Monitoring
conf# Configure in logstash.yml http.host: "0.0.0.0" http.port: 9600
View Pipeline Statistics
bashcurl -XGET 'localhost:9600/_node/stats/pipelines?pretty'
Log Level Adjustment
conf# Set in logstash.yml log.level: info
6. Architecture Optimization
Use Message Queues
Add message queues (such as Kafka, RabbitMQ) before and after Logstash:
- Decouple data producers and consumers
- Provide buffering capability to handle burst traffic
- Support multiple consumers for parallel processing
Cluster Deployment
- Use multiple Logstash instances to form a cluster
- Distribute traffic through load balancers
- Improve overall processing capability and availability
Use Beats
- Use lightweight data collectors like Filebeat, Metricbeat
- Beats have lower resource usage, suitable for deployment on edge nodes
- Logstash focuses on data processing and transformation
7. Real-world Cases
High Throughput Scenario
conf# logstash.yml pipeline.workers: 8 pipeline.batch.size: 500 pipeline.batch.delay: 10 # config/jvm.options -Xms8g -Xmx8g -XX:+UseG1GC
Low Latency Scenario
conf# logstash.yml pipeline.workers: 4 pipeline.batch.size: 50 pipeline.batch.delay: 5
Performance Testing
Use logstash-input-generator for performance testing:
confinput { generator { lines => ["test line"] count => 100000 } } output { stdout { codec => dots } }
Monitor metrics:
- Events per second (EPS)
- CPU usage
- Memory usage
- Network throughput