Docx Parser
raw JSON →The `docx-parser` package, at version 0.2.1, is an abandoned JavaScript library last published in October 2016. Its primary function is to extract plain text content from Microsoft Word `.docx` files using a simple, callback-based CommonJS API. It focuses solely on basic text extraction without support for rich text formatting, images, tables, or intricate document structures. Due to its age, it lacks modern features like Promise-based APIs, ESM support, or TypeScript typings, and is not actively maintained. Developers seeking robust DOCX parsing with comprehensive features, active development, and modern JavaScript/TypeScript compatibility should consider contemporary alternatives such as `@thasmorato/docx-parser` or `officeparser`, which offer streaming capabilities, memory efficiency, detailed Abstract Syntax Tree (AST) output, and broader support for office file formats.
Common errors
error Error: Cannot find module 'docx-parser' ↓
const docxParser = require('docx-parser'); instead of import statements. Also ensure the package is installed via npm install docx-parser. error TypeError: docxParser.parseDocx is not a function ↓
const docxParser = require('docx-parser'); is at the top of your file and that npm install docx-parser completed successfully. Ensure the variable name matches docxParser. error Error: File is not a zip ↓
.docx document. Open it in a program like 7-Zip or WinRAR to verify its internal structure (it should contain XML files like word/document.xml). Warnings
breaking This package is CommonJS-only and does not support ES Modules (`import`/`export`). Attempting to use ES module syntax will lead to module resolution errors. ↓
deprecated The package uses a callback-based API, which is a legacy pattern in modern JavaScript development. It does not offer a Promise-based or async/await interface. ↓
gotcha The package is no longer maintained, with the last publish over 10 years ago. This means it may have compatibility issues with newer Node.js versions, unpatched bugs, and lacks new features or security updates. ↓
gotcha The package extracts only plain text. It does not support parsing rich text formatting, images, tables, headers/footers, or other complex document structures found in DOCX files. ↓
Install
npm install docx-parser yarn add docx-parser pnpm add docx-parser Imports
- docxParser wrong
import docxParser from 'docx-parser';correctconst docxParser = require('docx-parser'); - parseDocx wrong
import { parseDocx } from 'docx-parser';correctconst docxParser = require('docx-parser'); docxParser.parseDocx('path/to/file.docx', (data) => { /* ... */ });
Quickstart
const docxParser = require('docx-parser');
const path = require('path');
const fs = require('fs');
// Create a dummy docx file for demonstration, as a real one is needed for parsing.
// In a real application, you would replace 'example.docx' with your actual file.
// Note: This package expects a valid .docx file, which is a zipped XML structure.
// Creating one programmatically for a quickstart is complex, so this is illustrative.
// For a runnable example, ensure 'example.docx' exists in your project root.
const dummyDocxPath = path.join(__dirname, 'example.docx');
// A real .docx is a zip file containing XML. A simple text file renamed to .docx will fail.
// For testing, place a real, simple .docx file at dummyDocxPath.
if (!fs.existsSync(dummyDocxPath)) {
console.warn(`Warning: '${dummyDocxPath}' not found. Please create a valid .docx file for parsing.`);
console.warn('Skipping parsing example.');
} else {
docxParser.parseDocx(dummyDocxPath, function(data){
console.log('Extracted Text:', data);
});
}