Docx Parser

raw JSON →
0.2.1 verified Thu Apr 23 auth: no javascript abandoned

The `docx-parser` package, at version 0.2.1, is an abandoned JavaScript library last published in October 2016. Its primary function is to extract plain text content from Microsoft Word `.docx` files using a simple, callback-based CommonJS API. It focuses solely on basic text extraction without support for rich text formatting, images, tables, or intricate document structures. Due to its age, it lacks modern features like Promise-based APIs, ESM support, or TypeScript typings, and is not actively maintained. Developers seeking robust DOCX parsing with comprehensive features, active development, and modern JavaScript/TypeScript compatibility should consider contemporary alternatives such as `@thasmorato/docx-parser` or `officeparser`, which offer streaming capabilities, memory efficiency, detailed Abstract Syntax Tree (AST) output, and broader support for office file formats.

error Error: Cannot find module 'docx-parser'
cause Attempting to import `docx-parser` using ES Modules syntax (`import`) in a context that does not allow CJS interop, or simply the package is not installed.
fix
Use const docxParser = require('docx-parser'); instead of import statements. Also ensure the package is installed via npm install docx-parser.
error TypeError: docxParser.parseDocx is not a function
cause This error typically occurs if the `docx-parser` module was imported incorrectly, or if the `require()` call failed silently, resulting in `docxParser` being `undefined` or an empty object.
fix
Verify that const docxParser = require('docx-parser'); is at the top of your file and that npm install docx-parser completed successfully. Ensure the variable name matches docxParser.
error Error: File is not a zip
cause DOCX files are essentially ZIP archives containing XML. This error indicates that the provided file is either corrupted, not a valid .docx file, or potentially a .doc file that was simply renamed to .docx.
fix
Ensure the input file is a genuine and uncorrupted .docx document. Open it in a program like 7-Zip or WinRAR to verify its internal structure (it should contain XML files like word/document.xml).
breaking This package is CommonJS-only and does not support ES Modules (`import`/`export`). Attempting to use ES module syntax will lead to module resolution errors.
fix Ensure your project uses CommonJS (`require()`) for this package, or use a tool like Webpack/Rollup with CJS compatibility configured. Consider migrating to a modern parser for ESM support.
deprecated The package uses a callback-based API, which is a legacy pattern in modern JavaScript development. It does not offer a Promise-based or async/await interface.
fix Wrap the callback function in a Promise manually if you need `async/await` compatibility, or use an actively maintained package that supports modern asynchronous patterns.
gotcha The package is no longer maintained, with the last publish over 10 years ago. This means it may have compatibility issues with newer Node.js versions, unpatched bugs, and lacks new features or security updates.
fix For new projects or production environments, it is highly recommended to use a currently maintained and more robust DOCX parsing library. Examples include `@thasmorato/docx-parser` or `officeparser`.
gotcha The package extracts only plain text. It does not support parsing rich text formatting, images, tables, headers/footers, or other complex document structures found in DOCX files.
fix If your application requires extraction of formatting, images, or a structured representation of the DOCX content (e.g., as an AST), this package is unsuitable. Look for libraries that explicitly advertise 'rich text parsing', 'AST output', or 'HTML conversion'.
npm install docx-parser
yarn add docx-parser
pnpm add docx-parser

Demonstrates how to import the `docx-parser` module and use its `parseDocx` function to extract plain text from a DOCX file using a callback. Users must provide a valid DOCX file.

const docxParser = require('docx-parser');
const path = require('path');
const fs = require('fs');

// Create a dummy docx file for demonstration, as a real one is needed for parsing.
// In a real application, you would replace 'example.docx' with your actual file.
// Note: This package expects a valid .docx file, which is a zipped XML structure.
// Creating one programmatically for a quickstart is complex, so this is illustrative.
// For a runnable example, ensure 'example.docx' exists in your project root.

const dummyDocxPath = path.join(__dirname, 'example.docx');
// A real .docx is a zip file containing XML. A simple text file renamed to .docx will fail.
// For testing, place a real, simple .docx file at dummyDocxPath.

if (!fs.existsSync(dummyDocxPath)) {
  console.warn(`Warning: '${dummyDocxPath}' not found. Please create a valid .docx file for parsing.`);
  console.warn('Skipping parsing example.');
} else {
  docxParser.parseDocx(dummyDocxPath, function(data){
    console.log('Extracted Text:', data);
  });
}