Puremagic: File Detection
Puremagic is a pure Python module designed to identify file types based on their 'magic numbers' and content-aware analysis. It offers a lightweight, cross-platform alternative to `python-magic`/`libmagic` with zero runtime dependencies. The library is actively maintained, with its current version being 2.2.0, and receives regular updates to enhance detection capabilities and fix issues.
Warnings
- breaking Version 2.0.0 removed support for Python versions 3.7 through 3.11. Puremagic now requires Python 3.12 or newer.
- breaking The `puremagic.what()` function, a drop-in replacement for `imghdr`, was removed in version 2.0.0.
- breaking The MIME type for WAV files changed from `audio/wave` to `audio/wav` in version 2.1.0.
- gotcha Starting with version 2.0.0, deep scanning for improved accuracy is enabled by default. This changes the detection behavior for many file types (e.g., Office documents, text files, JSON) compared to earlier versions.
Install
-
pip install puremagic
Imports
- puremagic
import puremagic
Quickstart
import puremagic
import os
# Create a dummy file for demonstration
dummy_filename = "test_file.txt"
with open(dummy_filename, "w") as f:
f.write("This is a test text file.")
# Get the most likely file extension
extension = puremagic.from_file(dummy_filename)
print(f"Detected extension: {extension}")
# Get all possible results with confidence and MIME type
results = puremagic.magic_file(dummy_filename)
print(f"All detection results: {results}")
# Clean up the dummy file
os.remove(dummy_filename)
# Example with an in-memory string
data_string = b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\x01\x00\x00\x00\x01\x08\x06\x00\x00\x00\x1f\x15\xc4\x89\x00\x00\x00\nIDATx\xda\xed\xc1\x01\x01\x00\x00\x00\xc2\xa0\xf7Om\x00\x00\x00\x00IEND\xaeB`\x82'
string_results = puremagic.magic_string(data_string)
print(f"String detection results: {string_results}")