Keywords: Rsync | File Synchronization | Pattern Matching
Abstract: This article delves into the complex interaction mechanisms of rsync's include and exclude options, demonstrating through a specific case study how to properly configure pattern matching for synchronizing specific files. It analyzes the reasons for the initial command failure, provides two effective solutions, and explains the priority rules of pattern matching. Additionally, it supplements with other common pattern examples to help readers fully master rsync's advanced filtering capabilities.
Core Principles of Rsync Pattern Matching Mechanism
Rsync, as a powerful file synchronization tool, allows fine-grained file filtering through flexible combinations of include and exclude options. However, the interaction logic between these options is often more complex than it appears. This article will deeply analyze the working principles of rsync pattern matching through a practical case study and provide effective configuration strategies.
Case Scenario and Problem Analysis
Assume we have a filesystem structure as follows:
uploads/
1260000000/
file_11_00.jpg
file_11_01.jpg
file_12_00.jpg
1270000000/
file_11_00.jpg
file_11_01.jpg
file_12_00.jpg
1280000000/
file_11_00.jpg
file_11_01.jpg
file_12_00.jpg
The goal is to synchronize only .jpg files starting with "file_11_" in all subdirectories using a single rsync command. The user attempted the command:
rsync -nrv --include="**/file_11*.jpg" --exclude="*" /Storage/uploads/ /website/uploads/
However, this command showed 0 files to transfer in the dry run. The root cause is that <code>--exclude="*"</code> excludes all files and directories, including the subdirectories themselves. Since subdirectories are excluded, rsync does not enter these directories to check their contents, so even if files match the include pattern, they cannot be discovered.
Solution 1: Include All Directories
The most straightforward solution is to explicitly include all directories to ensure rsync can traverse the directory structure:
rsync -nrv --include="*/" --include="file_11*.jpg" --exclude="*" /Storage/uploads/ /website/uploads/
This command works as follows:
- <code>--include="*/"</code> ensures all directories are included, allowing rsync to recursively enter subdirectories.
- <code>--include="file_11*.jpg"</code> matches the target files.
- <code>--exclude="*"</code> excludes all other files and directories.
The advantage of this method is its simplicity and generality, suitable for any directory structure.
Solution 2: Precise Directory Pattern Matching
If directory names follow specific patterns, more precise matching can be used:
rsync -nrv --include="/[0-9][0-9][0-9]0000000/" --include="file_11*.jpg" --exclude="*" /Storage/uploads/ /website/uploads/
In this command:
- <code>--include="/[0-9][0-9][0-9]0000000/"</code> only matches directories following specific numeric patterns (e.g., 1260000000, 1270000000, etc.).
- The other parts are the same as in Solution 1.
This method provides better control when directory structures are regular, avoiding inclusion of irrelevant directories.
Priority Rules for Pattern Matching
Rsync follows specific priority rules when processing include and exclude options:
- The order of options is crucial. When a file matches both include and exclude patterns, the earlier option takes precedence.
- Therefore, include options must be placed before exclude options to ensure matching files are not excluded.
- Pattern matching is applied level by level starting from the root of the file path; once a directory is excluded, its contents are not checked.
Supplementary Pattern Examples
Based on additional resources, here are some useful pattern examples:
"*" matches everything
"dir1" transfers empty directory [dir1]
"dir*" transfers empty directories starting with [dir]
"file*" transfers files starting with [file]
"dir**" transfers all paths starting with [dir]
"dir***" same as above
"dir1/*" does nothing
"dir1/**" does nothing
"dir1/***" transfers directory [dir1] and all its contents
It is important to note that avoiding multiple asterisks at the beginning of paths for pattern matching, such as "**dir", may not work as expected. Additionally, filename patterns with more than two asterisks are generally invalid.
Conclusion
Proper use of rsync's include and exclude options requires a deep understanding of its pattern matching mechanism. Key points include: ensuring directories are included to allow recursive traversal, arranging option order appropriately to control priority, and choosing general or precise patterns based on needs. Through the case analysis and solutions in this article, readers should be able to master the techniques for configuring complex file synchronization tasks, fully leveraging rsync's powerful capabilities.