Identify
Identify is a Python library primarily used for file identification. It processes files (or file metadata) and returns a set of standardized tags describing their type, executability, language (from shebangs), and more. Maintained by the pre-commit team, it is actively developed with a focus on robust file analysis heuristics. The current version is 2.6.18, and releases typically align with pre-commit's development cycle or as issues require attention.
Warnings
- gotcha The library identifies files based on a specific heuristic: first by file type, then executable bit, then file extension, then by peeking at file content bytes, and finally by interpreting shebangs. Users should be aware that it might not perform deep content analysis beyond these steps, potentially leading to unexpected tags if relying solely on ambiguous extensions or complex file formats.
- gotcha While `identify` includes an API for license determination, its approach (e.g., stripping copyright lines, normalizing whitespace) suggests it is a heuristic-based identification, not a definitive legal or cryptographic analysis. It may have limitations with highly customized licenses, incomplete files, or non-standard formatting.
Install
-
pip install identify
Imports
- tags_from_path
from identify import tags_from_path
Quickstart
import os
from identify import tags_from_path
# Create a dummy Python file
python_file_content = "#!/usr/bin/env python\nprint('Hello, world!')\n"
python_file_path = "temp_script.py"
with open(python_file_path, "w") as f:
f.write(python_file_content)
# Get tags for the dummy Python file
python_tags = tags_from_path(python_file_path)
print(f"Tags for '{python_file_path}': {python_tags}")
# Create a dummy text file
text_file_content = "This is a simple text file."
text_file_path = "temp_text.txt"
with open(text_file_path, "w") as f:
f.write(text_file_content)
# Get tags for the dummy text file
text_tags = tags_from_path(text_file_path)
print(f"Tags for '{text_file_path}': {text_tags}")
# Demonstrate with a non-existent path (will return an empty set)
non_existent_tags = tags_from_path("non_existent_file.xyz")
print(f"Tags for 'non_existent_file.xyz': {non_existent_tags}")
# Clean up the dummy files
os.remove(python_file_path)
os.remove(text_file_path)