Properly Handling Command Output in Bash Scripts: Avoiding Pitfalls of Word Splitting and Filename Expansion

Dec 07, 2025 · Programming · 10 views · 7.8

Keywords: Bash scripting | command output processing | while read loop

Abstract: This paper thoroughly examines the common issues of word splitting and filename expansion when looping through command output in Bash scripts. Through analysis of a typical ps command output processing case, it reveals the limitations of using for loops for multi-line output. The article systematically explains the mechanism of the Internal Field Separator (IFS) and its inadequacies in line processing, while detailing the superiority of the while read combination. By comparing the practical effects of for loops versus while read, along with alternative approaches using the pgrep command, it provides multiple robust line processing patterns. Finally, for complex fields containing spaces, it offers practical techniques for field order adjustment to ensure script reliability and maintainability.

Problem Context and Common Misconceptions

In Bash script development, processing command output is a daily task. A typical scenario involves using the ps command to obtain process information for further handling. Developers often attempt to directly iterate over command output using for loops, but this leads to unexpected behavior. For example, consider the following command output:

$ ps -ewo pid,cmd,etime | grep python | grep -v grep | grep -v sh
 3089 python /var/www/atm_securit       37:02
17116 python /var/www/atm_securit       00:01
17119 python /var/www/atm_securit       00:01
17122 python /var/www/atm_securit       00:01
17125 python /var/www/atm_securit       00:00

If using the following script:

for tbl in $(ps -ewo pid,cmd,etime | grep python | grep -v grep | grep -v sh)
do
   echo $tbl
done

The output becomes:

3089
python
/var/www/atm_securit
38:06
17438
python
/var/www/atm_securit
00:02
17448
python
/var/www/atm_securit
00:01

This is not line-by-line processing but word splitting. The root cause is Bash's word splitting mechanism.

Word Splitting Mechanism and Limitations of IFS

When Bash performs command substitution $(...), it defaults to word splitting based on the Internal Field Separator $IFS. The default IFS includes space, tab, and newline characters. Therefore, each whitespace-separated word in multi-line output becomes an independent iteration item.

Even if IFS is set to newline:

IFS='\n'
for i in $(cat file); do
    echo "$i"
done

It may still be subject to filename expansion. If the output contains wildcards like * or ?, Bash attempts filename matching, causing unexpected behavior. For instance, if a line contains *.txt and matching files exist in the current directory, it will be expanded to a file list.

Robust Solution with while read Combination

It is recommended to use a while loop combined with the read command to process command output. The read built-in command reads input line by line, avoiding word splitting and filename expansion issues. The basic syntax is:

command | while read -r line; do
    # Process $line
done

The -r option prevents backslash escaping, ensuring the original line content.

Optimized Approach Using pgrep

For process searching, the pgrep command is more concise and efficient than the ps | grep combination. pgrep -f can search the entire command line, but note it may match non-Python processes. Example:

pgrep -f python | while read -r pid; do
    echo "$pid"
done

If process names are needed:

pgrep -af python | while read -r line; do
    echo "$line"
done

If separating PID and command:

pgrep -af python | while read -r pid cmd; do
    echo "pid: $pid, cmd: $cmd"
done

Handling Complex Fields with Spaces

When command output fields contain spaces, the variable assignment mechanism of read requires special attention. read splits the line by IFS and assigns the remaining part to the last variable. For example, in the original ps output, the cmd field may contain spaces, leading to parsing errors. The solution is to adjust the field order, placing fields that may contain spaces last:

ps -ewo pid,etime,cmd | grep python | grep -v grep | grep -v sh \
  | while read -r pid etime cmd; do
    echo "$pid $cmd $etime"
done

This way, even if the cmd field contains spaces, it will be fully assigned to the cmd variable.

Summary and Best Practices

When processing command output in Bash scripts, prioritize the while read combination over for loops to avoid word splitting and filename expansion issues. For process searching, consider using pgrep to simplify commands. When handling fields containing spaces, ensure correct parsing by adjusting output order. These practices enhance script reliability and maintainability, reducing unexpected errors.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.