Git Filter Repo
git-filter-repo is a powerful and fast tool for rewriting Git repository history, designed as a modern and more efficient replacement for the deprecated `git filter-branch`. It is currently at version 2.47.0 and receives active development, with releases typically occurring every few weeks to months, addressing common repository maintenance tasks like removing sensitive data, extracting subdirectories, and reorganizing history.
Warnings
- breaking git-filter-repo is the official replacement for `git filter-branch`, which is now deprecated by the Git project itself. Its command-line arguments and internal behavior are significantly different. Existing scripts or workflows relying on `git filter-branch` will break if migrated without modification.
- gotcha git-filter-repo irreversibly rewrites your Git repository's history across all branches and tags. This can lead to data loss or desynchronization issues for collaborators if not handled correctly. Never run it on a production repository without a full backup, and communicate changes clearly to your team.
- gotcha git-filter-repo requires a 'pristine' repository state: no uncommitted changes, no unpushed commits on the current branch (unless `--force` is used carefully), and a full (non-shallow) clone. It will often refuse to run if these conditions are not met, to prevent accidental data loss.
- gotcha While `git-filter-repo` is written in Python and can be used programmatically as a library (e.g., `import git_filter_repo`), its API is explicitly NOT guaranteed to be stable and may change between versions. Most users interact with it as a command-line tool.
Install
-
pip install git-filter-repo
Imports
- git-filter-repo (CLI) / git_filter_repo (Library)
For command-line execution: `subprocess.run(['git-filter-repo', '--option', 'value'])`. For programmatic library use (advanced): `import git_filter_repo as fr` (requires specific setup, see notes).
Quickstart
import subprocess
import os
import shutil
# --- Setup: Create a dummy repo for demonstration ---
repo_name = "test_repo_filter"
repo_path = os.path.join(os.getcwd(), repo_name)
# Clean up previous run if exists
if os.path.exists(repo_path):
shutil.rmtree(repo_path)
os.makedirs(repo_path)
os.chdir(repo_path)
subprocess.run(["git", "init", "-b", "main"], check=True, capture_output=True)
subprocess.run(["git", "config", "user.email", "test@example.com"], check=True, capture_output=True)
subprocess.run(["git", "config", "user.name", "Test User"], check=True, capture_output=True)
with open("file1.txt", "w") as f:
f.write("initial content")
subprocess.run(["git", "add", "file1.txt"], check=True, capture_output=True)
subprocess.run(["git", "commit", "-m", "Initial commit"], check=True, capture_output=True)
with open("secret.txt", "w") as f:
f.write("super secret info")
subprocess.run(["git", "add", "secret.txt"], check=True, capture_output=True)
subprocess.run(["git", "commit", "-m", "Add secret file"], check=True, capture_output=True)
with open("file1.txt", "a") as f:
f.write("\nmore content")
subprocess.run(["git", "add", "file1.txt"], check=True, capture_output=True)
subprocess.run(["git", "commit", "-m", "Update file1"], check=True, capture_output=True)
print("Original log (last 3 commits):")
subprocess.run(["git", "log", "--oneline", "-3"], check=True)
print("\n--- Running git-filter-repo to remove 'secret.txt' ---")
# IMPORTANT: git-filter-repo *modifies history irreversibly*. Always back up your repository.
# For this demo, we run directly. In a real scenario, consider cloning a backup first.
try:
# Ensure a clean working directory, which git-filter-repo often requires.
subprocess.run(["git", "reset", "--hard"], check=True, capture_output=True)
# The actual filter-repo command to remove 'secret.txt' from all history.
# '--force' is often needed to bypass safety checks in non-fresh clones or testing.
filter_repo_cmd = ["git-filter-repo", "--path-rename", "secret.txt:--delete", "--force"]
print(f"Executing: {' '.join(filter_repo_cmd)}")
subprocess.run(filter_repo_cmd, check=True)
print("\nFiltered log (last 3 commits):")
subprocess.run(["git", "log", "--oneline", "-3"], check=True)
# Verify the file is gone and not in history
search_log_cmd = ["git", "log", "--all", "--", "secret.txt"]
result = subprocess.run(search_log_cmd, capture_output=True, text=True)
if not result.stdout:
print("\n'secret.txt' successfully removed from history.")
else:
print("\nERROR: 'secret.txt' still found in history. Output:\n" + result.stdout)
except subprocess.CalledProcessError as e:
print(f"Error running git-filter-repo: {e}")
print(f"Stdout: {e.stdout.decode()}")
print(f"Stderr: {e.stderr.decode()}")
finally:
# Clean up the dummy repo
os.chdir("../")
if os.path.exists(repo_path):
shutil.rmtree(repo_path)