PDFiD

raw JSON →
1.1.3 verified Fri May 01 auth: no python

A Python library based on DidierStevens' PDFID tool for analyzing PDF files for malicious content. It scans PDFs for common suspicious elements like JavaScript, embedded files, and auto-actions. Current version 1.1.3, with intermittent releases.

pip install pdfid
error AttributeError: module 'pdfid' has no attribute 'PDFiD'
cause Incorrect import: importing pdfid as a whole instead of the class.
fix
Use: from pdfid import PDFiD
error TypeError: initializer got an unexpected keyword argument 'path'
cause PDFiD does not accept a file path; it requires bytes content.
fix
Read file content first: with open('file.pdf', 'rb') as f: data = f.read() ; pdfid = PDFiD(data)
gotcha PDFiD expects the entire file content as bytes; passing a file path or file object will fail silently or raise an error.
fix Open the file in binary mode and read all bytes before passing to PDFiD.
gotcha The results() method returns a dictionary with keys like 'JavaScript', 'OpenAction', etc. The values are the count of occurrences, not booleans.
fix Check counts > 0 to determine presence: if pdfid.results().get('JavaScript', 0) > 0.

Instantiate PDFiD with file content as bytes, then call results() for a dictionary of findings.

from pdfid import PDFiD

# Analyze a PDF file
with open('sample.pdf', 'rb') as f:
    data = f.read()

pdfid = PDFiD(data)
print(pdfid.results())