Magika
Magika is an AI-powered file type detection tool developed by Google. It leverages deep learning to accurately identify content types, supporting over 200 formats including binary and textual files. It boasts high accuracy (~99%) and fast inference times, making it suitable for security, data processing, and development workflows. The current version is 1.0.2, and it receives active development and regular updates.
Warnings
- breaking Starting with version 1.0.2, Magika no longer automatically loads `.env` files. If your application relied on `python-dotenv` being implicitly managed by Magika, this behavior has changed.
- breaking Magika 1.0 represents a significant rewrite, with the core engine and CLI implemented in Rust, and the Python module being 'revamped.' While intended for easier integration, users migrating from pre-1.0 experimental versions (0.x.x) might encounter API changes or behavioral differences requiring code updates. The Python CLI itself also shifted from a pure Python implementation to a Rust binary wrapper around version 0.6.0.
- gotcha For large files, using `magika.identify_bytes()` can lead to high memory consumption as it loads the entire file content into memory. This is not efficient for very large files.
- gotcha The `onnxruntime` dependency, while crucial for Magika's performance, can be substantial in size and may introduce platform-specific installation complexities. While Magika 1.0.2 improved support for Python 3.14 and removed some Windows-specific pins, `onnxruntime` itself has had historical issues with specific Python versions or architectures.
Install
-
pip install magika
Imports
- Magika
from magika import Magika
- PredictionMode
from magika import Magika, PredictionMode
Quickstart
from magika import Magika
import os
m = Magika()
# Example 1: Identify from bytes
file_content_bytes = b"console.log('Hello, Magika!');"
result_bytes = m.identify_bytes(file_content_bytes)
print(f"Content type (bytes): {result_bytes.output.label}")
# Example 2: Identify from a dummy file path
# Create a dummy file for demonstration
dummy_file_path = "./dummy_script.js"
with open(dummy_file_path, "wb") as f:
f.write(file_content_bytes)
result_path = m.identify_path(dummy_file_path)
print(f"Content type (path): {result_path.output.label}")
os.remove(dummy_file_path)