Keywords: Shell Script | File Modification Time | find Command | tar Archiving | Recursive Search
Abstract: This article provides an in-depth exploration of various Shell script methods for recursively finding files modified after a specific time and archiving them in Unix/Linux systems. It focuses on the synergistic use of find and tar commands, including the time calculation mechanism of the -mtime parameter, pipeline processing techniques with xargs, and the importance of the --no-recursion option. The article also compares advanced time options in GNU find with alternative approaches using touch and -newer, offering complete code examples and practical application scenarios. Performance differences and suitable use cases for different methods are discussed to help readers choose optimal solutions based on specific requirements.
Introduction
In Unix/Linux system administration, batch operations based on file modification times are frequently required. This article addresses the common need to recursively find files modified after a specific time and archive them, providing multiple Shell script implementation approaches.
Basic Implementation: Synergistic Use of find and tar
The most straightforward solution combines find and tar commands. The find command's -mtime parameter filters files based on modification time, while tar handles archiving and compression.
Basic syntax:
find . -mtime -1 | xargs tar --no-recursion -czf myfile.tgz
Where:
find . -mtime -1: Recursively finds files modified within the last 24 hours in current directory and subdirectoriesxargs: Passes find output as arguments to subsequent commandstar --no-recursion -czf myfile.tgz: Creates gzip-compressed tar archive without recursive processing
Time Parameter Details
The -mtime parameter supports various time formats:
# Files modified within last 24 hours
find . -mtime -1
# Files modified within last 36 hours (supports decimals)
find . -mtime -1.5
# Files modified exactly 24 hours ago
find . -mtime 1
# Files modified more than 24 hours ago
find . -mtime +1
Time calculation is based on 24-hour intervals, with negative values indicating past, positive values indicating future, and zero indicating exactly 24 hours.
Advanced Time Options in GNU find
For more precise time control, GNU find provides rich options:
-mmin n: Time filtering based on minutes (n minutes ago)-newer file: Files newer than specified file-daystart: Calculate time from start of day
Alternative Approach Using touch and -newer
When filtering based on absolute time rather than relative time is needed, combine touch with find's -newer option:
# Create timestamp file
touch -t 200901031231.43 /tmp/reference_file
# Find files newer than reference file
find . -newer /tmp/reference_file -print
# Clean up temporary file
rm -f /tmp/reference_file
This method is particularly suitable for script environments, ensuring filename uniqueness through process ID:
#!/bin/bash
REF_FILE="/tmp/timestamp_$$"
# Set trap to ensure temporary file cleanup
trap 'rm -f "$REF_FILE"' EXIT INT TERM
# Create reference time file
touch -t "$1" "$REF_FILE"
# Execute find and archive
find . -newer "$REF_FILE" -type f | xargs tar --no-recursion -czf "$2"
Direct Time Filtering with tar
Some tar versions support direct time filtering:
tar -N '2014-02-01 18:00:00' -jcvf archive.tar.bz2 files
This approach is concise but limited in functionality, lacking support for complex recursive searches.
Path Preservation and Directory Structure
Maintaining original path structure during archiving is crucial. The --no-recursion option ensures tar processes only specific files found by find, without redundant directory structure processing, avoiding duplicate archiving and path confusion.
Performance Considerations and Best Practices
When dealing with large numbers of files, performance becomes important:
- Use
-type fto limit search to files only, excluding directories - Consider using
-maxdepthto limit recursion depth - For extremely large directories, process in batches
- Use
2>/dev/nullto suppress error output
Error Handling and Robustness
Production environment scripts require comprehensive error handling:
#!/bin/bash
set -euo pipefail
TIMESTAMP="${1:-}" # Get time from parameter
OUTPUT="${2:-backup_$(date +%Y%m%d_%H%M%S).tgz}" # Default output filename
if [[ -z "$TIMESTAMP" ]]; then
echo "Usage: $0 timestamp [output_file]"
exit 1
fi
# Create unique temporary file
TEMP_FILE="/tmp/backup_ref_$$_$RANDOM"
# Ensure temporary file cleanup
trap '[[ -f "$TEMP_FILE" ]] && rm -f "$TEMP_FILE"' EXIT
if ! touch -t "$TIMESTAMP" "$TEMP_FILE" 2>/dev/null; then
echo "Error: Invalid timestamp format"
exit 1
fi
# Execute backup
if find . -newer "$TEMP_FILE" -type f -print0 | tar --null -T - --no-recursion -czf "$OUTPUT"; then
echo "Backup completed: $OUTPUT"
else
echo "Backup failed"
exit 1
fi
Comparison with Other Methods
Compared to the ls combined with grep approach mentioned in reference articles, the find solution offers significant advantages:
- Better performance, especially with large numbers of files
- More precise time control
- Native support for recursive directory traversal
- Avoids format dependency issues when parsing ls output
Practical Application Scenarios
These techniques are widely applied in:
- Regular incremental file backups
- Log file rotation and archiving
- File synchronization in code deployment
- System maintenance and cleanup tasks
Conclusion
File archiving based on modification time is a common requirement in Shell script programming. By properly combining find, tar, and related tools, efficient and reliable solutions can be constructed. When choosing specific methods, factors such as time precision requirements, performance needs, and system environment should be considered.