PyDriller

2.9 · active · verified Thu Apr 16

PyDriller is a Python framework designed for mining software repositories. It enables developers to easily extract detailed information from any Git repository, including commits, developers, file modifications, diffs, and source code. The library is actively maintained with frequent minor releases to introduce new features and improvements.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize a `Repository` object, traverse all commits, and access basic commit information. It also shows an example of using filters to analyze a specific number of recent commits within a particular branch.

from pydriller import Repository

repo_url = "https://github.com/ishepard/pydriller.git" # Or a local path: "/path/to/your/repo"

# Iterate over all commits in the repository
print(f"Analyzing repository: {repo_url}")
for commit in Repository(repo_url).traverse_commits():
    print(f"  Hash: {commit.hash}")
    print(f"  Author: {commit.author.name} <{commit.author.email}>")
    print(f"  Date: {commit.author_date}")
    print(f"  Message: {commit.msg.splitlines()[0]}")
    print(f"  Files changed: {len(commit.modifications)}")
    for modification in commit.modifications:
        print(f"    - {modification.change_type.name}: {modification.new_path}")

# Example with filters (last 5 commits in a specific branch)
import datetime

# For testing, we use a specific older commit hash and a small number of commits
# In a real scenario, you might use 'since=datetime.datetime(2023, 1, 1)'
print("\nAnalyzing last 5 commits in 'master' branch:")
for commit in Repository(
    repo_url,
    order="reverse", # Get recent commits first
    num_commits=5, 
    only_in_branches=['master']
).traverse_commits():
    print(f"  Commit: {commit.hash[:7]} - {commit.msg.splitlines()[0]}")

view raw JSON →