Magika

1.0.2 · active · verified Thu Apr 09

Magika is an AI-powered file type detection tool developed by Google. It leverages deep learning to accurately identify content types, supporting over 200 formats including binary and textual files. It boasts high accuracy (~99%) and fast inference times, making it suitable for security, data processing, and development workflows. The current version is 1.0.2, and it receives active development and regular updates.

Warnings

Install

Imports

Quickstart

Instantiate the `Magika` class and use `identify_bytes` or `identify_path` to determine the content type of a file or byte string. The `output.label` field provides the identified content type. For optimal performance with large files, `identify_path` or `identify_stream` are recommended as they avoid loading the entire content into memory.

from magika import Magika
import os

m = Magika()

# Example 1: Identify from bytes
file_content_bytes = b"console.log('Hello, Magika!');"
result_bytes = m.identify_bytes(file_content_bytes)
print(f"Content type (bytes): {result_bytes.output.label}")

# Example 2: Identify from a dummy file path
# Create a dummy file for demonstration
dummy_file_path = "./dummy_script.js"
with open(dummy_file_path, "wb") as f:
    f.write(file_content_bytes)

result_path = m.identify_path(dummy_file_path)
print(f"Content type (path): {result_path.output.label}")

os.remove(dummy_file_path)

view raw JSON →