PyDriller
PyDriller is a Python framework designed for mining software repositories. It enables developers to easily extract detailed information from any Git repository, including commits, developers, file modifications, diffs, and source code. The library is actively maintained with frequent minor releases to introduce new features and improvements.
Common errors
-
ImportError: cannot import name 'RepositoryMining' from 'pydriller'
cause The `RepositoryMining` class was renamed to `Repository` in PyDriller 2.0.fixChange the import statement to `from pydriller import Repository`. -
pydriller.git_repository.GitCommandError: Cmd('git') failed due to: exit code(128)cause This usually means that the Git executable is not found in your system's PATH, or there's an issue with the repository path provided (e.g., it doesn't exist, is not a Git repo, or has corrupted Git objects).fixEnsure Git is installed on your system and its executable is in your system's PATH. Verify that `path_to_repo` points to a valid and accessible Git repository. Check for required Git versions (e.g., 2.38+). -
Exception: Could not find commit <commit_hash> (e.g., when using `single` filter)
cause The specified commit hash might not exist in the cloned repository, especially if it belongs to a non-main branch, a rebased history, or a detached head that PyDriller's default cloning doesn't fetch.fixEnsure the repository is fully cloned (e.g., `include_remotes=True`). If the commit is truly missing from the fetched history, consider using `git fetch --all` or `git pull --all` on the local repository before running PyDriller. If the commit exists only in a specific ref or remote, ensure that ref is included in the analysis filters.
Warnings
- breaking The main class for repository mining was renamed from `RepositoryMining` to `Repository` in PyDriller 2.0. Using `RepositoryMining` will result in an `ImportError`.
- deprecated The `ModifiedFile.source_code` attribute was deprecated in version 2.2. It is replaced by `ModifiedFile.content`.
- gotcha Combining multiple filters of the same category (e.g., `from_tag` and `from_commit`) or using `single` with other filters is not supported and will raise an error.
- gotcha For very large repositories, traversing all commits can be very time-consuming and memory-intensive, potentially taking hours.
- gotcha The `Git.checkout()` method modifies the repository state on disk. Using it with `num_workers > 1` (multithreading) or parallel `Repository` instances can lead to race conditions and incorrect results.
- gotcha When `num_workers` is set to a value greater than 1 for parallel processing, the order in which commits are returned by `traverse_commits()` is not guaranteed.
Install
-
pip install pydriller
Imports
- Repository
from pydriller import RepositoryMining
from pydriller import Repository
- Commit
from pydriller.domain.commit import Commit
- ModifiedFile
from pydriller.domain.commit import ModifiedFile
Quickstart
from pydriller import Repository
repo_url = "https://github.com/ishepard/pydriller.git" # Or a local path: "/path/to/your/repo"
# Iterate over all commits in the repository
print(f"Analyzing repository: {repo_url}")
for commit in Repository(repo_url).traverse_commits():
print(f" Hash: {commit.hash}")
print(f" Author: {commit.author.name} <{commit.author.email}>")
print(f" Date: {commit.author_date}")
print(f" Message: {commit.msg.splitlines()[0]}")
print(f" Files changed: {len(commit.modifications)}")
for modification in commit.modifications:
print(f" - {modification.change_type.name}: {modification.new_path}")
# Example with filters (last 5 commits in a specific branch)
import datetime
# For testing, we use a specific older commit hash and a small number of commits
# In a real scenario, you might use 'since=datetime.datetime(2023, 1, 1)'
print("\nAnalyzing last 5 commits in 'master' branch:")
for commit in Repository(
repo_url,
order="reverse", # Get recent commits first
num_commits=5,
only_in_branches=['master']
).traverse_commits():
print(f" Commit: {commit.hash[:7]} - {commit.msg.splitlines()[0]}")