What are the common text processing tools in Shell scripts? How to use grep, sed, awk, and cut? - 面试题

Common text processing tools in Shell scripts include grep, sed, awk, and cut.

grep - Text Search Tool

Basic Usage

bash
# Search for text in file
grep "pattern" file.txt

# Search multiple files
grep "pattern" file1.txt file2.txt

# Recursive search in directory
grep -r "pattern" /path/to/directory

# Case insensitive search
grep -i "pattern" file.txt

# Show line numbers
grep -n "pattern" file.txt

# Invert match (exclude)
grep -v "pattern" file.txt

# Show only matching filenames
grep -l "pattern" *.txt

# Count matching lines
grep -c "pattern" file.txt

Regular Expressions

bash
# Match start of line
grep "^start" file.txt

# Match end of line
grep "end$" file.txt

# Match digits
grep "[0-9]" file.txt

# Match specific count
grep "a\{3\}" file.txt  # Match 3 a's

# Use extended regular expressions
grep -E "pattern1|pattern2" file.txt

Practical Applications

bash
# Find process
ps aux | grep "nginx"

# Find errors in logs
grep "ERROR" /var/log/syslog

# Find files containing specific content
grep -r "TODO" ./src

# Count code lines
grep -c "^" *.py

sed - Stream Editor

Basic Usage

bash
# Replace text
sed 's/old/new/' file.txt

# Global replacement
sed 's/old/new/g' file.txt

# Delete lines
sed '3d' file.txt           # Delete line 3
sed '/pattern/d' file.txt   # Delete matching lines

# Print specific lines
sed -n '5p' file.txt        # Print line 5
sed -n '1,5p' file.txt      # Print lines 1-5

# Insert and append
sed '2i\new line' file.txt  # Insert before line 2
sed '2a\new line' file.txt  # Append after line 2

Advanced Usage

bash
# Use regular expressions
sed 's/[0-9]\+//g' file.txt

# Multiple replacements
sed -e 's/old1/new1/g' -e 's/old2/new2/g' file.txt

# In-place editing (modify original file)
sed -i 's/old/new/g' file.txt

# Edit with backup
sed -i.bak 's/old/new/g' file.txt

# Use variables
var="pattern"
sed "s/$var/replacement/g" file.txt

Practical Applications

bash
# Replace values in config file
sed -i 's/port=8080/port=9090/' config.ini

# Delete comment lines
sed '/^#/d' file.txt

# Delete empty lines
sed '/^$/d' file.txt

# Format output
sed 's/\s\+/ /g' file.txt

awk - Text Processing Tool

Basic Usage

bash
# Print specific columns
awk '{print $1}' file.txt

# Print multiple columns
awk '{print $1, $3}' file.txt

# Specify delimiter
awk -F: '{print $1}' /etc/passwd

# Print line numbers
awk '{print NR, $0}' file.txt

# Conditional printing
awk '$3 > 100 {print $0}' file.txt

Built-in Variables

bash
NR      # Current record number (line number)
NF      # Number of fields in current record
$0      # Complete record
$1, $2  # 1st, 2nd fields
FS      # Field separator (default space)
OFS     # Output field separator
RS      # Record separator (default newline)
ORS     # Output record separator

Patterns and Actions

bash
# Pattern matching
awk '/pattern/ {print $0}' file.txt

# BEGIN and END blocks
awk 'BEGIN {print "Start"} {print $0} END {print "End"}' file.txt

# Calculate sum
awk '{sum += $1} END {print sum}' file.txt

# Calculate average
awk '{sum += $1; count++} END {print sum/count}' file.txt

Practical Applications

bash
# Calculate total file size
ls -l | awk '{sum += $5} END {print sum}'

# Find maximum value
awk '{if ($1 > max) max = $1} END {print max}' file.txt

# Format output
awk '{printf "%-10s %10s\n", $1, $2}' file.txt

# Process CSV file
awk -F, '{print $1, $3}' data.csv

cut - Text Cutting Tool

Basic Usage

bash
# Cut by characters
cut -c 1-5 file.txt        # Extract characters 1-5
cut -c 1,5,10 file.txt     # Extract characters 1, 5, 10

# Cut by bytes
cut -b 1-10 file.txt

# Cut by fields
cut -d: -f1 /etc/passwd    # Extract 1st field
cut -d: -f1,3 /etc/passwd  # Extract 1st and 3rd fields

Practical Applications

bash
# Extract usernames
cut -d: -f1 /etc/passwd

# Extract IP address
ifconfig | grep "inet " | cut -d: -f2 | cut -d' ' -f1

# Extract file extension
echo "file.txt" | cut -d. -f2

Combined Usage Examples

Log Analysis

bash
# Count errors
grep "ERROR" /var/log/app.log | wc -l

# Find logs for specific time period
sed -n '/2024-01-01 10:00/,/2024-01-01 11:00/p' /var/log/app.log

# Extract IP addresses
grep "ERROR" /var/log/app.log | awk '{print $5}' | cut -d: -f2

# Count error types
grep "ERROR" /var/log/app.log | awk '{print $6}' | sort | uniq -c

Text Processing

bash
# Delete empty lines and comments
sed '/^$/d; /^#/d' file.txt

# Replace multiple spaces with single space
sed 's/\s\+/ /g' file.txt

# Extract specific column and deduplicate
awk '{print $1}' file.txt | sort -u

# Calculate average
awk '{sum += $1} END {print sum/NR}' file.txt

System Administration

bash
# Find processes with highest CPU usage
ps aux | sort -rk 3 | head -n 5

# Find processes with highest memory usage
ps aux | sort -rk 4 | head -n 5

# Count processes per user
ps aux | awk '{print $1}' | sort | uniq -c

# Find process using specific port
lsof -i :8080 | awk '{print $2}' | tail -n +2

Best Practices

Combine tools with pipes: grep | awk | sort | uniq
Prefer grep for searching: Fastest for simple searches
Use sed for replacement: Preferred tool for text replacement
Use awk for column data: Best choice for structured text
Use cut for fixed positions: Simple text cutting tasks
Be aware of regex syntax: grep and sed have slightly different regex
Test commands: Test commands before processing important files