antiword

raw JSON →
0.1.0 verified Fri May 01 auth: no python

A Python library to convert MS Word .doc files to plain text, wrapping the external antiword command-line tool. Version 0.1.0 (stable).

pip install antiword
error AttributeError: module 'antiword' has no attribute 'antiword'
cause Trying `import antiword; antiword.antiword('file.doc')` but the import is incorrect.
fix
Use from antiword import antiword then antiword('file.doc').
error antiword: command not found
cause The system binary `antiword` is not installed.
fix
Install the system package: sudo apt-get install antiword or brew install antiword.
error Exception: Antiword returned non-zero exit status 1
cause The input file is not a valid .doc file or is corrupt.
fix
Verify the file is a proper .doc file (not .docx) and not corrupted.
gotcha The library is a thin wrapper and requires the `antiword` command-line tool to be installed on the system (e.g., via `apt install antiword`).
fix Install antiword system package: `brew install antiword` (macOS) or `apt-get install antiword` (Debian/Ubuntu).
breaking The package `antiword` on PyPI (version 0.1.0) is not the same as the commonly known `antiword` CLI tool. It is a Python wrapper with a different API than older third-party wrappers.
fix Use `from antiword import antiword`; not `import antiword` or other imports.
gotcha Only supports .doc files (not .docx). Attempting to convert .docx will raise an error.
fix Ensure input files are .doc format, or use a library like `python-docx` for .docx.

Convert a .doc file to text; requires antiword binary installed.

from antiword import antiword
text = antiword('document.doc')
print(text)