hexdefender
Email
  • Introduction to Linux
    • Overview
    • Linux Kernel
    • Linux Distros
    • Introduction to Kali Linux
    • Install Kali on VirtualBox
    • Install Kali on AWS
  • Linux Commands
    • Linux File Systems
    • Basic File and Directory commands
    • File Permissions and Ownerships
    • System Commands in Linux
    • Text Processing Commands in Linux
    • Linux Archive Utility
    • Package Management in Kali Linux
    • Networking Commands
    • Disk Utility Tools
    • Linux List of CLI Command lookup
    • Linux CLI Cheatsheet
    • Assignment
  • Networking Essentials
    • Overview
    • Networking Protocols
    • IP Addressing & Subnetting
    • DNS and DNS Security
    • Network Devices and Architecture
    • VPNs and Secure Tunnels
    • Network Address Translation (NAT) & Port Forwarding
    • Wireless Networks & Protocols
    • Cloud Networking & Security
    • Common Network Tools
  • Bash Scripting
    • Fundamentals of Bash
    • Variables, Branching and Loops
    • System Variables in Bash
    • Functions and Error Handling in Bash Scripts
    • File Handling and Text Processing
    • 5 Useful Bash Scripts for Everyday Tasks
    • Useful Assignments
  • Fundamentals of Cybersecurity
    • Introduction to Cybersecurity
    • Importance of Cybersecurity
    • Important Cybersecurity Frameworks
    • Cybersecurity Roles and Career Options
  • Penetration Testing
    • Reconnaissance and Footprinting
    • Exploitation Techniques
      • Introduction
      • Service Enumeration
      • Password Attacks
      • Exploit Discovery
      • The Art of Exploitation
      • The Pentester's guide to Metasploit
    • Post Exploitation - Malware & Escalation
  • Web Application Security
    • Common Web Vulnerabilities
    • OWASP Top 10
    • SQL Injections
    • Cross Site Scripting Attacks
    • Web Application Firewalls
    • Secure Coding Practices
  • Cryptography
    • Basic concepts of cryptography
    • Examples of Asymetric & Hashing functions
    • Public Key Infrastructure
    • Digital Signatures
    • Symmetric and Asymmetric Encryption
  • Social Engineering
    • Introduction to Social Engineering
    • Mitigation Strategies for Social Engineering
  • Digital Forensics
    • Digital Forensics Basics
    • Forensics Tools and Techniques
    • Reverse Engineering Fundamentals
    • Malware Analysis
Powered by GitBook
On this page
  1. Linux Commands

Text Processing Commands in Linux


Chapter 5: Text Processing Commands in Linux

Overview

In this chapter, we will explore powerful text processing commands in Linux: grep, awk, and sed. These commands are essential for searching, filtering, and manipulating text data, making them invaluable for system administrators and developers alike. Additionally, we will cover how to use regular expressions (regex) to enhance the functionality of these commands.


1. grep

Introduction to grep

The grep command is used to search for patterns in files and output. It stands for "Global Regular Expression Print." It can search for specific strings, patterns, or even complex regular expressions.

Key Features

  • Case Sensitivity: By default, grep is case-sensitive. Use the -i option for case-insensitive searches.

  • Line Number: The -n option displays line numbers alongside matching lines.

  • Recursive Search: The -r option allows for searching in all files within a directory and its subdirectories.

  • Inverting Matches: The -v option shows lines that do not match the specified pattern.

Regular Expressions with grep

grep utilizes regular expressions to perform complex pattern matching. For example:

  • ^ asserts the start of a line.

  • $ asserts the end of a line.

  • . matches any single character.

  • * matches zero or more occurrences of the preceding element.

  • [] defines a character class.

Examples

Example Data (data.txt):

apple
banana
cherry
apple pie
grape fruit
orange

1. Basic Search

grep 'apple' data.txt

Output:

apple
apple pie

Explanation: This command searches for lines containing the string "apple."

2. Case-Insensitive Search

grep -i 'APPLE' data.txt

Output:

apple
apple pie

Explanation: This command finds "apple" regardless of case.

3. Display Line Numbers

grep -n 'apple' data.txt

Output:

1:apple
4:apple pie

Explanation: This command displays the line numbers of matching lines.

4. Recursive Search

grep -r 'fruit' /path/to/directory/

Output:

/path/to/directory/file1.txt:grape fruit

Explanation: This command searches for "fruit" in all files within the specified directory.

5. Invert Match

grep -v 'apple' data.txt

Output:

banana
cherry
grape fruit
orange

Explanation: This command displays all lines that do not contain "apple."

6. Using Regular Expressions

grep '^a' data.txt

Output:

apple
apple pie

Explanation: This command finds all lines that start with the letter "a."


2. awk

Introduction to awk

awk is a powerful programming language used for pattern scanning and processing. It is especially useful for working with structured data and performing operations on specific fields within a text file.

Key Features

  • Field Separator: The -F option allows users to specify the field delimiter.

  • Pattern Matching: Users can define conditions to control which lines are processed.

  • Built-in Variables: awk provides built-in variables such as NR (current record number) and NF (number of fields in the current record).

Examples

Example Data (data.csv):

Name,Age,Department
Alice,30,Engineering
Bob,25,Sales
Charlie,35,Marketing

1. Print Specific Columns

awk -F',' '{print $1, $3}' data.csv

Output:

Name Department
Alice Engineering
Bob Sales
Charlie Marketing

Explanation: This command prints the first and third columns (Name and Department) from the CSV file.

2. Conditional Printing

awk -F',' '$2 > 28 {print $1}' data.csv

Output:

Alice
Charlie

Explanation: This command prints the names of individuals whose age is greater than 28.

3. Sum of a Column

awk -F',' 'NR > 1 {sum += $2} END {print sum}' data.csv

Output:

90

Explanation: This command calculates the total age of all individuals in the file, ignoring the header.

4. Pattern Matching

awk -F',' '/Engineering/ {print $1}' data.csv

Output:

Alice

Explanation: This command prints the names of individuals in the Engineering department.

5. Using Built-in Variables

awk -F',' 'NR == 2 {print "The age of " $1 " is " $2}' data.csv

Output:

The age of Alice is 30

Explanation: This command uses NR to access the second record and prints a formatted string.


3. sed

Introduction to sed

sed is a stream editor that allows users to perform basic text transformations on an input stream (a file or input from a pipeline). It is particularly useful for automated editing and complex text manipulations.

Key Features

  • Substitution: The s/pattern/replacement/ syntax allows for replacing text.

  • In-place Editing: The -i option allows for modifying files directly.

  • Addressing: Users can specify line numbers or patterns to determine which lines to operate on.

Examples

Example Data (config.txt):

# Server configuration
host=localhost
port=8080
# Uncomment the following line to enable debug
# debug=true

1. Basic Substitution

sed 's/localhost/127.0.0.1/' config.txt

Output:

# Server configuration
host=127.0.0.1
port=8080
# Uncomment the following line to enable debug
# debug=true

Explanation: This command replaces "localhost" with "127.0.0.1."

2. In-place Editing

sed -i 's/8080/9090/' config.txt

Output:

# Server configuration
host=127.0.0.1
port=9090
# Uncomment the following line to enable debug
# debug=true

Explanation: This command changes the port from 8080 to 9090 directly in the file.

3. Print Specific Lines

sed -n '2,3p' config.txt

Output:

host=localhost
port=8080

Explanation: This command prints lines 2 and 3 from the file.

4. Delete Lines Matching a Pattern

sed '/^#/d' config.txt

Output:

host=localhost
port=8080
debug=true

Explanation: This command removes all comment lines starting with '#'.

5. Substitute with a Regular Expression

sed 's/^port=\([0-9]*\)/port=\1 (changed)/' config.txt

Output:

# Server configuration
host=localhost
port=8080 (changed)
# Uncomment the following line to enable debug
# debug=true

Explanation: This command captures the port number and appends "(changed)" to it.


Combining grep, awk, and sed

You can combine these powerful commands in a pipeline to achieve complex text processing tasks. Here’s an example:

Command:

grep 'localhost' config.txt | awk -F'=' '{print $1}' | sed 's/#//'

Output:

host

Explanation: This command searches for "localhost," extracts the variable name using awk, and removes any '#' using sed.


Useful Resources

  1. Regular Expressions Tutorial

  2. grep Documentation

  3. awk Documentation

  4. sed Documentation

  5. Linux Command Line Resources


Interview Questions and Answers

  1. Q: How can you replace all occurrences of a word in a file using sed?

    • A: Use the command sed -i 's/old_word/new_word/g' filename.

  2. Q: Can you explain how to use regex with grep?

    • A: Regular expressions can be used with grep to match patterns in text. For example, grep '^a' filename finds lines starting with 'a'.

  3. Q: What does the -n option do in grep?

    • A: The -n option displays line numbers along with the matching lines.

  4. Q: How do you extract the second column from a CSV file using awk?

    • A: Use the command awk -F',' '{print $2}' filename.csv.


Conclusion

In this chapter, we've covered the essential text processing commands in Linux: grep, awk, and sed. By mastering these tools and their associated regular expressions, you will greatly enhance your ability to handle text data efficiently and effectively.


PreviousSystem Commands in LinuxNextLinux Archive Utility

Last updated 8 months ago

Regular Expressions 101
RegexOne
Regex Tutorial by MDN
GNU grep Manual
GNU awk Manual
GNU sed Manual
Linux Command Line Basics by edX