{"id":8479,"library":"pydriller","title":"PyDriller","description":"PyDriller is a Python framework designed for mining software repositories. It enables developers to easily extract detailed information from any Git repository, including commits, developers, file modifications, diffs, and source code. The library is actively maintained with frequent minor releases to introduce new features and improvements.","status":"active","version":"2.9","language":"en","source_language":"en","source_url":"https://github.com/ishepard/pydriller","tags":["git","version control","software engineering","mining software repositories","code analysis"],"install":[{"cmd":"pip install pydriller","lang":"bash","label":"Install PyDriller"}],"dependencies":[{"reason":"PyDriller requires Git to be installed and accessible in the system's PATH to interact with repositories.","package":"Git","optional":false},{"reason":"PyDriller uses GitPython internally to interface with Git repositories. It is automatically installed as a dependency.","package":"GitPython","optional":false},{"reason":"Used internally by PyDriller for calculating structural code metrics (e.g., NLOC, cyclomatic complexity) on modified files. Users typically do not need to install it directly.","package":"Lizard","optional":true}],"imports":[{"note":"The primary class for mining repositories was renamed from `RepositoryMining` to `Repository` in PyDriller 2.0.","wrong":"from pydriller import RepositoryMining","symbol":"Repository","correct":"from pydriller import Repository"},{"symbol":"Commit","correct":"from pydriller.domain.commit import Commit"},{"symbol":"ModifiedFile","correct":"from pydriller.domain.commit import ModifiedFile"}],"quickstart":{"code":"from pydriller import Repository\n\nrepo_url = \"https://github.com/ishepard/pydriller.git\" # Or a local path: \"/path/to/your/repo\"\n\n# Iterate over all commits in the repository\nprint(f\"Analyzing repository: {repo_url}\")\nfor commit in Repository(repo_url).traverse_commits():\n    print(f\"  Hash: {commit.hash}\")\n    print(f\"  Author: {commit.author.name} <{commit.author.email}>\")\n    print(f\"  Date: {commit.author_date}\")\n    print(f\"  Message: {commit.msg.splitlines()[0]}\")\n    print(f\"  Files changed: {len(commit.modifications)}\")\n    for modification in commit.modifications:\n        print(f\"    - {modification.change_type.name}: {modification.new_path}\")\n\n# Example with filters (last 5 commits in a specific branch)\nimport datetime\n\n# For testing, we use a specific older commit hash and a small number of commits\n# In a real scenario, you might use 'since=datetime.datetime(2023, 1, 1)'\nprint(\"\\nAnalyzing last 5 commits in 'master' branch:\")\nfor commit in Repository(\n    repo_url,\n    order=\"reverse\", # Get recent commits first\n    num_commits=5, \n    only_in_branches=['master']\n).traverse_commits():\n    print(f\"  Commit: {commit.hash[:7]} - {commit.msg.splitlines()[0]}\")","lang":"python","description":"This quickstart demonstrates how to initialize a `Repository` object, traverse all commits, and access basic commit information. It also shows an example of using filters to analyze a specific number of recent commits within a particular branch."},"warnings":[{"fix":"Update your import statement from `from pydriller import RepositoryMining` to `from pydriller import Repository`.","message":"The main class for repository mining was renamed from `RepositoryMining` to `Repository` in PyDriller 2.0. Using `RepositoryMining` will result in an `ImportError`.","severity":"breaking","affected_versions":">=2.0"},{"fix":"Replace `modification.source_code` with `modification.content` when accessing the source code of a modified file.","message":"The `ModifiedFile.source_code` attribute was deprecated in version 2.2. It is replaced by `ModifiedFile.content`.","severity":"deprecated","affected_versions":">=2.2"},{"fix":"Ensure that only one filter from a given category (e.g., 'from', 'to') is used at a time. For complex filtering, retrieve a broader set of commits and apply programmatic filtering afterwards.","message":"Combining multiple filters of the same category (e.g., `from_tag` and `from_commit`) or using `single` with other filters is not supported and will raise an error.","severity":"gotcha","affected_versions":"all"},{"fix":"Utilize filters like `since`, `to`, `num_commits`, `only_in_branches`, or `only_commits` to narrow down the scope of analysis. Consider running analyses on subsets of the repository history.","message":"For very large repositories, traversing all commits can be very time-consuming and memory-intensive, potentially taking hours.","severity":"gotcha","affected_versions":"all"},{"fix":"Avoid using `Git.checkout()` in multi-threaded scenarios. If repository state manipulation is necessary, perform it in a single-threaded context or ensure proper locking/isolation mechanisms.","message":"The `Git.checkout()` method modifies the repository state on disk. Using it with `num_workers > 1` (multithreading) or parallel `Repository` instances can lead to race conditions and incorrect results.","severity":"gotcha","affected_versions":"all"},{"fix":"If commit order is crucial for your analysis, set `num_workers=1` or sort the commits after retrieval based on `commit.author_date` or `commit.committer_date`.","message":"When `num_workers` is set to a value greater than 1 for parallel processing, the order in which commits are returned by `traverse_commits()` is not guaranteed.","severity":"gotcha","affected_versions":"all"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Change the import statement to `from pydriller import Repository`.","cause":"The `RepositoryMining` class was renamed to `Repository` in PyDriller 2.0.","error":"ImportError: cannot import name 'RepositoryMining' from 'pydriller'"},{"fix":"Ensure Git is installed on your system and its executable is in your system's PATH. Verify that `path_to_repo` points to a valid and accessible Git repository. Check for required Git versions (e.g., 2.38+).","cause":"This usually means that the Git executable is not found in your system's PATH, or there's an issue with the repository path provided (e.g., it doesn't exist, is not a Git repo, or has corrupted Git objects).","error":"pydriller.git_repository.GitCommandError: Cmd('git') failed due to: exit code(128)"},{"fix":"Ensure the repository is fully cloned (e.g., `include_remotes=True`). If the commit is truly missing from the fetched history, consider using `git fetch --all` or `git pull --all` on the local repository before running PyDriller. If the commit exists only in a specific ref or remote, ensure that ref is included in the analysis filters.","cause":"The specified commit hash might not exist in the cloned repository, especially if it belongs to a non-main branch, a rebased history, or a detached head that PyDriller's default cloning doesn't fetch.","error":"Exception: Could not find commit <commit_hash> (e.g., when using `single` filter)"}]}