Keywords: Bash scripting | find command | pathname expansion
Abstract: This article explores two approaches for handling multiple file type searches in Bash scripts: using the -o operator in the find command and the safer pathname expansion technique. Through comparative analysis, it reveals potential filename parsing issues when storing results from find, especially with special characters like spaces and newlines. The paper details the secure pattern of combining Bash arrays with pathname expansion, providing complete code examples and step-by-step explanations to help developers avoid common pitfalls and write robust scripts.
Introduction
File searching is a common task in Bash script programming. Beginners often start with the find command but quickly encounter the need to search for multiple file types. This article builds on a typical problem: how to find multiple file types (e.g., .pdf, .txt, and .bmp) in Bash scripts, with an in-depth analysis of the pros and cons of different methods.
Using the -o Operator in the find Command
The find command provides the -o (logical OR) operator to handle multiple search conditions. For example, to find .bmp and .txt files in the /home/user/Desktop directory, use:
list="$(find /home/user/Desktop -name '*.bmp' -o -name '*.txt')"This command connects two -name conditions with -o, enabling multi-file type searches. However, this approach has a potential issue: when results are stored in a variable, special characters (e.g., spaces and newlines) can cause parsing errors.
Limitations of the find Command
To demonstrate the problem, consider this test scenario:
$ touch 'one.txt' 'two three.txt' 'foo.bmp'
$ list="$(find . -name \*.txt -o -name \*.bmp -type f)"
$ for file in $list; do if [ ! -f "$file" ]; then echo "MISSING: $file"; fi; done
MISSING: ./two
MISSING: three.txtHere, the filename "two three.txt" is incorrectly split into two parts, causing the script to fail to recognize the file properly. This occurs because Bash word-splits unquoted variables based on the Internal Field Separator (default: space, tab, and newline).
Safe Approach with Pathname Expansion and Bash Arrays
A safer method is to use pathname expansion (globbing) combined with Bash arrays. Pathname expansion handles filenames directly in the shell, avoiding output parsing issues from find. Example code:
$ a=( *.txt *.bmp )
$ declare -p a
declare -a a=([0]="one.txt" [1]="two three.txt" [2]="foo.bmp")
$ for file in "${a[@]}"; do ls -l "$file"; done
-rw-r--r-- 1 ghoti staff 0 24 May 16:27 one.txt
-rw-r--r-- 1 ghoti staff 0 24 May 16:27 two three.txt
-rw-r--r-- 1 ghoti staff 0 24 May 16:27 foo.bmpThis approach stores the file list in an array and uses double quotes to ensure each element is processed intact, safely handling filenames with special characters.
Supplementary Method: Using find with -print0
As a reference, another way to handle special characters is to combine find's -print0 option with the read command's -d parameter:
find . -type f -name '*.*' -print0 | while IFS= read -r -d '' file; do
printf '%s\n' "$file"
doneThis method uses null characters as delimiters, avoiding issues from spaces and newlines, but adds code complexity.
Conclusion
When handling multiple file type searches in Bash scripts, the combination of pathname expansion and arrays is generally safer and more concise. It leverages the shell's built-in capabilities, avoiding pitfalls from parsing external command output. For scenarios requiring recursive searches or complex conditions, the find command remains useful, but techniques like -print0 should be used to handle special characters. Developers should choose the appropriate method based on specific needs and always test scripts for edge cases.