Batch Modification of Author and Committer Information in Git Historical Commits

Oct 27, 2025 · Programming · 14 views · 7.8

Keywords: Git | History Rewriting | Author Information Correction | filter-branch | filter-repo

Abstract: This technical paper comprehensively examines methods for batch modifying author and committer information in Git version control system historical commits. Through detailed analysis of core tools including git filter-branch, git rebase, and git filter-repo, it elaborates on applicable approaches, operational procedures, and precautions for different scenarios. The paper particularly emphasizes the impact of history rewriting on SHA1 hashes and provides best practice guidelines for safe operations, covering environment variable configuration, script writing, and alternative tool usage to help developers correct metadata without compromising project history.

Git History Rewriting and Author Information Correction

In software development, Git as a distributed version control system relies heavily on author and committer information in commit history for project tracing and team collaboration. However, due to configuration errors, tool issues, or personal information changes, developers frequently need to correct metadata in historical commits. Based on high-scoring Stack Overflow answers and official documentation, this paper systematically analyzes technical solutions for batch modifying Git commit author information.

Core Concepts of History Rewriting

Each commit object in Git contains two types of identity information: author and committer, recorded in GIT_AUTHOR_NAME/GIT_AUTHOR_EMAIL and GIT_COMMITTER_NAME/GIT_COMMITTER_EMAIL environment variables respectively. Modifying this historical information essentially constitutes history rewriting operations, generating entirely new commit objects and altering original SHA1 hashes. Such operations require special caution on already pushed branches, as they force overwriting remote repository history and may affect other collaborators' work.

Detailed Analysis of git filter-branch Method

As Git's built-in history rewriting tool, git filter-branch provides powerful batch processing capabilities. Its --env-filter parameter allows modification of commit environment variables through shell scripts:

#!/bin/sh
git filter-branch --env-filter '
OLD_EMAIL="your-old-email@example.com"
CORRECT_NAME="Your Correct Name"
CORRECT_EMAIL="your-correct-email@example.com"
if [ "$GIT_COMMITTER_EMAIL" = "$OLD_EMAIL" ]
then
    export GIT_COMMITTER_NAME="$CORRECT_NAME"
    export GIT_COMMITTER_EMAIL="$CORRECT_EMAIL"
fi
if [ "$GIT_AUTHOR_EMAIL" = "$OLD_EMAIL" ]
then
    export GIT_AUTHOR_NAME="$CORRECT_NAME"
    export GIT_AUTHOR_EMAIL="$CORRECT_EMAIL"
fi
' --tag-name-filter cat -- --branches --tags

This script iterates through all branches and tags, automatically replacing author and committer information when detecting specified old email addresses. The --tag-name-filter cat parameter ensures synchronized tag reference updates, while -- --branches --tags specifies operation scope covering all branches and tags.

Security Warnings and Performance Considerations

Git official documentation explicitly warns that git filter-branch contains numerous pitfalls that may cause unintended history manipulation. Its performance issues become particularly prominent in large repositories, where processing tens of thousands of commits may take hours. Main risks include: incorrect subtree filtering, corrupted commit messages,混乱的标签引用等. Therefore, the official recommendation is to use git filter-repo as an alternative solution.

Modern Alternative: git filter-repo

git filter-repo is a high-performance history filtering tool developed by Git contributors, specifically designed to address filter-branch's shortcomings. Its mailmap file-based approach provides safer and more reliable processing:

Proper Name <proper@email.xx> Commit Name <commit@email.xx>

After creating a git-mailmap file in the above format, execute:

git filter-repo --mailmap git-mailmap

This command automatically maps all matching commit identity information while maintaining excellent performance. Compared to filter-branch, filter-repo offers clearer error messages and progress feedback.

Configuration Preprocessing and Verification Steps

Before executing any history rewriting operations, it's recommended to update local Git configuration first:

git config --global user.name "New Author Name"
git config --global user.email "<email@address.example>"

This ensures newly generated commits use correct metadata. For important repositories, create complete backups before operations:

git clone --mirror original-repo backup-repo

During verification phase, use git log --oneline to examine changes in the first few commits, confirming information correction meets expectations.

Simplified Solutions for Specific Scenarios

For cases requiring modification of only recent commits or limited ranges, lightweight solutions can be used. Single commit correction:

git commit --amend --no-edit --reset-author

Consecutive commit range correction:

git rebase -r <base-commit> --exec 'git commit --amend --no-edit --reset-author'

The -r parameter ensures proper handling of merge commits, while --exec automatically executes correction commands after each commit rewrite.

Considerations in Team Collaboration Environments

In multi-person collaborative projects, history rewriting requires coordination among all participants. Standard procedures include: notifying the team to pause relevant branch development, executing rewriting operations, force-pushing updates, requiring members to re-clone or reset local branches. For published version tags, modifications should be avoided to prevent breaking downstream dependencies.

Non-Destructive Correction Solutions

When only needing to correct name spelling or old email addresses, Git's .mailmap functionality enables non-destructive corrections. Create a .mailmap file in repository root:

Correct Name <correct@email.com> <old@email.com>

Git commands like git log will automatically apply these mappings without rewriting history. This method is particularly suitable for metadata cleanup in public repositories.

Conclusion and Best Practices

Batch modification of Git historical author information is a high-risk operation requiring appropriate tool selection based on specific scenarios. git filter-repo becomes the preferred choice due to its security and performance advantages, while git filter-branch should serve as a backup solution. Regardless of the method adopted, the test-backup-verify operation流程 must be followed, with thorough communication in team environments to ensure version history integrity and collaboration continuity.

Copyright Notice: All rights in this article are reserved by the operators of DevGex. Reasonable sharing and citation are welcome; any reproduction, excerpting, or re-publication without prior permission is prohibited.