PDFiD
raw JSON → 1.1.3 verified Fri May 01 auth: no python
A Python library based on DidierStevens' PDFID tool for analyzing PDF files for malicious content. It scans PDFs for common suspicious elements like JavaScript, embedded files, and auto-actions. Current version 1.1.3, with intermittent releases.
pip install pdfid Common errors
error AttributeError: module 'pdfid' has no attribute 'PDFiD' ↓
cause Incorrect import: importing pdfid as a whole instead of the class.
fix
Use: from pdfid import PDFiD
error TypeError: initializer got an unexpected keyword argument 'path' ↓
cause PDFiD does not accept a file path; it requires bytes content.
fix
Read file content first: with open('file.pdf', 'rb') as f: data = f.read() ; pdfid = PDFiD(data)
Warnings
gotcha PDFiD expects the entire file content as bytes; passing a file path or file object will fail silently or raise an error. ↓
fix Open the file in binary mode and read all bytes before passing to PDFiD.
gotcha The results() method returns a dictionary with keys like 'JavaScript', 'OpenAction', etc. The values are the count of occurrences, not booleans. ↓
fix Check counts > 0 to determine presence: if pdfid.results().get('JavaScript', 0) > 0.
Imports
- PDFiD wrong
import pdfidcorrectfrom pdfid import PDFiD - PDFiD2JSON wrong
from pdfid.PDFiD2JSON import PDFiD2JSONcorrectfrom pdfid import PDFiD2JSON
Quickstart
from pdfid import PDFiD
# Analyze a PDF file
with open('sample.pdf', 'rb') as f:
data = f.read()
pdfid = PDFiD(data)
print(pdfid.results())