Keywords: Git repository merging | file history preservation | unrelated history merge
Abstract: This article provides a comprehensive guide to merging two independent Git repositories into a new unified repository while maintaining complete file history. It analyzes the limitations of traditional subtree merge approaches and presents a solution based on remote repository addition, merging, and file relocation. Complete PowerShell script examples are provided, with detailed explanations of the critical --allow-unrelated-histories parameter and special considerations for handling in-progress feature branches. The method ensures that git log <file> commands display complete file change histories without truncation.
In software development, there is often a need to merge two independent Git repositories into a new unified repository. Traditional subtree merge methods can combine code but frequently result in lost file history—when executing git log <file>, only the subtree merge commit is visible, preventing tracing of the file's complete change history from the original repositories. This issue has been discussed across technical communities but lacks systematic solutions.
Analysis of Traditional Method Limitations
Subtree merge is a common Git approach for handling external dependencies, but it is primarily designed for managing library integration rather than completely merging independent repositories. When using subtree merge, Git adds source repository files as new content to the target repository, truncating file history. While git log still displays commit history from source repositories, the association between this history and specific files is broken because Git treats the entire subtree as a separate entity.
Complete Merging Solution Based on Remote Repositories
To truly preserve file history, a different strategy is required. The core approach involves adding each source repository as a remote to the new repository, performing merge operations, and finally relocating files to avoid naming conflicts.
Here are the detailed implementation steps:
- Initialize the new repository and create an initial commit:
git init
git commit --allow-empty -m "Initial dummy commit"
An initial commit is necessary because Git requires at least one commit before merging. The --allow-empty parameter allows creating a commit without file changes.
git remote add --fetch old_a <OldA repo URL>
The --fetch parameter (or -f) immediately fetches all commits to the local repository after adding the remote.
git merge old_a/master --allow-unrelated-histories
This is the critical step. The --allow-unrelated-histories parameter allows merging two independent repositories without common ancestors. Without this parameter, Git would reject the merge operation.
mkdir old_a
dir -exclude old_a | foreach { git mv $_.Name old_a }
git commit -m "Move old_a files into subdir"
File relocation is necessary to avoid conflicts between files from different repositories. Using git mv (rather than operating system-level move commands) ensures Git correctly tracks file rename history.
git remote add -f old_b <OldB repo URL>
git merge old_b/master --allow-unrelated-histories
mkdir old_b
dir –exclude old_a,old_b | foreach { git mv $_.Name old_b }
git commit -m "Move old_b files into subdir"
Handling In-Progress Feature Branches
If source repositories contain feature branches not yet merged into master, these also need migration to the new repository. This requires a special merge strategy:
git checkout -b feature-in-progress
git merge -s recursive -Xsubtree=old_a old_a/feature-in-progress
This uses not subtree merge but the subtree option (-Xsubtree) of recursive merge. This option tells Git that target files have been moved to the specified subdirectory, helping Git align file paths correctly.
In-Depth Technical Principles
This method preserves complete file history by leveraging Git's underlying mechanisms:
- Remote Repository References: By adding remote repositories, Git can access all commit objects from source repositories, including complete commit history, file trees, and parent commit references.
- Unrelated History Merging: The
--allow-unrelated-historiesparameter allows Git to create repository structures containing multiple root commits. Each source repository's initial commit becomes one of the new repository's root commits. - File Rename Tracking: The
git mvcommand is internally recorded as a file rename operation rather than deletion followed by new file addition. This enables Git to correctly associate file history across different commits.
Comparison with Alternative Methods
Another common approach involves creating branches before merging (as described in Answer 2):
git fetch secondrepo
git branch branchfromsecondrepo secondrepo/master
git checkout branchfromsecondrepo
mkdir subdir/
git ls-tree -z --name-only HEAD | xargs -0 -I {} git mv {} subdir/
git commit -m "Moved files to subdir/"
git checkout master
git merge --allow-unrelated-histories branchfromsecondrepo
This method is equally effective but requires more manual steps. The main difference is that it moves files on a temporary branch before merging to master. Both methods share the same core principle: merging unrelated histories via --allow-unrelated-histories and avoiding conflicts through file relocation.
Practical Implementation Considerations
When implementing this solution, consider the following factors:
- Large Repository Handling: For repositories with numerous commits or large files, merge operations may take significant time. It is advisable to execute during off-hours with sufficient disk space.
- Conflict Resolution: If both repositories contain files with identical names, conflicts will arise during merging. These require manual resolution, with Git providing detailed conflict markers.
- Submodules and Subtrees: If source repositories use submodules or subtrees, additional handling of these dependencies is needed, typically requiring recursive application of the same merge strategy.
- CI/CD Configuration: After merging, update CI/CD pipeline configurations to ensure builds and tests properly handle the new repository structure.
The merged repository created through this method will contain complete commit history from all source repositories. Executing git log --follow old_a/somefile.txt will display all commits for that file from initial creation to latest modification, including history from the original repository. This provides complete historical context for code auditing, issue tracking, and team collaboration.