pdf2docx

JSON →
library 0.5.12 ·python maintenance
verified May 22, 2026

pdf2docx is an open-source Python library designed for converting PDF files into editable Microsoft Word DOCX documents. It leverages PyMuPDF for PDF data extraction, applies rule-based parsing for layout analysis, and utilizes python-docx for generating the final DOCX output. The library aims to extract text, images, and tables while preserving the original layout and formatting. The current version is 0.5.12, released on March 9, 2026.

total hits 22
actors 8 distinct systems
last hit 1d ago ByteDance
ByteDance
4
Script
3
GPTBot
2
ClaudeBot
1
Google-Other
1
Search engines
1

top countries 🇺🇸 United States · 🇫🇷 France · 🇸🇬 Singapore · 🇬🇧 United Kingdom · 🇩🇪 Germany