Identify

raw JSON →
2.6.18 verified Tue May 12 auth: no python install: stale quickstart: stale

Identify is a Python library primarily used for file identification. It processes files (or file metadata) and returns a set of standardized tags describing their type, executability, language (from shebangs), and more. Maintained by the pre-commit team, it is actively developed with a focus on robust file analysis heuristics. The current version is 2.6.18, and releases typically align with pre-commit's development cycle or as issues require attention.

pip install identify
error ModuleNotFoundError: No module named 'identify'
cause The `identify` library has not been installed in your Python environment or the Python interpreter cannot locate it in its search paths. [1, 2, 3, 4, 9, 11]
fix
Install the library using pip: pip install identify
error pre-commit: command not found. Did you forget to activate your virtualenv?
cause When using `identify` as a `pre-commit` hook, the `pre-commit` executable itself is not found in the system's PATH, often because a virtual environment where `pre-commit` is installed is not active. [8, 14, 18, 19]
fix
Activate your virtual environment before running git commands, or ensure pre-commit is installed globally or accessible in your shell's PATH. If not installed, run pip install pre-commit and pre-commit install in your repository. [8, 14]
error AttributeError: 'File' object has no attribute 'tag'
cause You are attempting to access a non-existent attribute named 'tag' on the `File` object returned by `identify_file`. The correct attribute to get all tags is `tags` (a frozenset), not `tag`. [13, 16]
fix
Access the tags property (which returns a frozenset of tags) instead of a singular 'tag' attribute: file_obj.tags
error AttributeError: 'frozenset' object has no attribute 'startswith'
cause After obtaining the `tags` (a frozenset of strings) from a `File` object, you are attempting to call a string method like `startswith()` directly on the frozenset object itself, rather than on individual tags within it. [13, 16]
fix
Iterate over the frozenset of tags and apply string methods to each individual tag: for tag in file_obj.tags: if tag.startswith('text'): ...
error TypeError: identify_file() missing 1 required positional argument: 'path'
cause The `identify_file` function was called without providing the `path` argument, which is a required parameter for the function to know which file to analyze. [20, 22, 25, 29]
fix
Provide the path to the file as an argument when calling identify_file: from identify import identify_file; identify_file('your_file.py')
gotcha The library identifies files based on a specific heuristic: first by file type, then executable bit, then file extension, then by peeking at file content bytes, and finally by interpreting shebangs. Users should be aware that it might not perform deep content analysis beyond these steps, potentially leading to unexpected tags if relying solely on ambiguous extensions or complex file formats.
fix Understand the library's identification process as documented on its GitHub page. For critical identification, consider complementing `identify` with more specialized tools if needed.
gotcha While `identify` includes an API for license determination, its approach (e.g., stripping copyright lines, normalizing whitespace) suggests it is a heuristic-based identification, not a definitive legal or cryptographic analysis. It may have limitations with highly customized licenses, incomplete files, or non-standard formatting.
fix Do not treat `identify`'s license tags as legal advice or a substitute for thorough manual review. Use it as an initial indicator, and verify licenses manually, especially for open-source compliance or distribution.
breaking The `identify` library raises an `ImportError` because the name `tags_from_path` cannot be imported. This indicates that the function is not directly available or does not exist in the installed version of the `identify` library, likely due to an API change, a specific version where it's not exposed, or an older version of the library being used.
fix Review the `identify` library's documentation and changelog to confirm the correct function name and import path for your installed version. Ensure your `identify` version is compatible with the code expecting `tags_from_path`. Consider upgrading `identify` to a version that includes this API, or adjusting your import statement and code to match the available API of your current `identify` installation.
breaking The `identify` library raises an `ImportError` for `tags_from_path`. This indicates a significant API change where `tags_from_path` is either not directly exposed from the top-level `identify` package, has been renamed, or removed in the installed version, preventing basic usage.
fix Review the `identify` library's documentation and changelog for the installed version to ascertain the correct import path or replacement function for `tags_from_path`. Modify the import statement accordingly or install a compatible version of the library.
python os / libc status wheel install import disk
3.10 alpine (musl) - - - -
3.10 slim (glibc) - - - -
3.11 alpine (musl) - - - -
3.11 slim (glibc) - - - -
3.12 alpine (musl) - - - -
3.12 slim (glibc) - - - -
3.13 alpine (musl) - - - -
3.13 slim (glibc) - - - -
3.9 alpine (musl) - - - -
3.9 slim (glibc) - - - -

This example demonstrates how to use `tags_from_path` to identify a Python script and a plain text file, and also shows the result for a non-existent file path. It creates temporary files and cleans them up.

import os
from identify import tags_from_path

# Create a dummy Python file
python_file_content = "#!/usr/bin/env python\nprint('Hello, world!')\n"
python_file_path = "temp_script.py"
with open(python_file_path, "w") as f:
    f.write(python_file_content)

# Get tags for the dummy Python file
python_tags = tags_from_path(python_file_path)
print(f"Tags for '{python_file_path}': {python_tags}")

# Create a dummy text file
text_file_content = "This is a simple text file."
text_file_path = "temp_text.txt"
with open(text_file_path, "w") as f:
    f.write(text_file_content)

# Get tags for the dummy text file
text_tags = tags_from_path(text_file_path)
print(f"Tags for '{text_file_path}': {text_tags}")

# Demonstrate with a non-existent path (will return an empty set)
non_existent_tags = tags_from_path("non_existent_file.xyz")
print(f"Tags for 'non_existent_file.xyz': {non_existent_tags}")

# Clean up the dummy files
os.remove(python_file_path)
os.remove(text_file_path)