Removing Large Files from Git Commit History Using Filter-Repo

Nov 18, 2025 · Programming · 45 views · 7.8

Keywords: Git | Version Control | History Rewriting | Large File Cleanup | Filter-Repo

Abstract: This technical article provides a comprehensive guide on permanently removing large files from Git repository history using the git filter-repo tool. Through detailed case analysis, it explains key steps including file identification, filtering operations, and remote repository updates, while offering best practice recommendations. Compared to traditional filter-branch methods, filter-repo demonstrates superior efficiency and compatibility, making it the recommended solution in modern Git workflows.

Problem Context and Challenges

In Git version control systems, developers occasionally commit large files (such as videos, archives, etc.) to repositories by mistake. Even if these files are deleted in subsequent commits, they remain in Git history, causing persistent repository bloat. This scenario commonly occurs due to accidental operations, like adding DVD image files to web projects—though immediately removed, the large files persist in historical commits.

Limitations of Traditional Methods

Early solutions primarily relied on the git filter-branch command, but this approach has multiple drawbacks: slow execution, complex operations, and in some cases, incomplete cleanup of remote repositories. More importantly, Git officially marks filter-branch as deprecated and no longer recommends its use.

Modern Solution: Git Filter-Repo

git filter-repo is specifically designed for rewriting Git history, offering significant advantages over traditional methods: higher execution efficiency, simpler operations, and more reliable results. The following demonstrates the complete workflow through concrete examples.

Step 1: Identify Large Files

First, locate large file objects in the repository using this command to list the largest files:

git rev-list --objects --all | grep -f <(git verify-pack -v .git/objects/pack/*.idx | sort -k 3 -n | cut -f 1 -d " " | tail -10)

This command displays the 10 largest files in the repository, helping identify targets for cleanup.

Step 2: Perform Filtering Operations

Based on identification results, use filter-repo to remove specified files. Multiple matching patterns are supported:

Match by file path:

git filter-repo --path-glob '../../src/../..' --invert-paths --force

Match by file extension:

git filter-repo --path-glob '*.zip' --invert-paths --force

Match by specific file type:

git filter-repo --path-glob '*.a' --invert-paths --force

The --invert-paths parameter indicates removal of matched files, while --force ensures the operation executes.

Step 3: Update Remote Repository

After local history rewriting, force push to the remote repository:

git push --all --force
git push --tags --force

Note: Force pushing overwrites remote history—exercise caution in collaborative environments and ensure all collaborators sync updates.

Operational Considerations

Before executing history rewriting operations, create a repository backup:

git clone --mirror original-repo.git backup-repo.git

For team projects, coordinate all developers:

Alternative Solution Comparison

Besides filter-repo, other tools are available:

BFG Repo-Cleaner: A Java tool specialized for cleaning large files, with simple operation:

java -jar bfg.jar --strip-blobs-bigger-than 100M my-repo.git

Interactive Rebase: Suitable for modifying recent commit history, using git rebase -i to enter interactive mode and edit specific commits to remove files.

Best Practice Recommendations

Prevention is better than cure—establish good Git habits:

Conclusion

git filter-repo provides an efficient solution for permanently removing large files from Git history. Through accurate file identification, precise filtering operations, and proper team coordination, repository bloat can be successfully resolved while maintaining clean project history. In practical applications, choose the most appropriate cleanup strategy based on specific project requirements.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.