{"id":13079,"library":"docx4js","title":"docx4js - JavaScript DOCX and PPTX Parser","description":"docx4js is a JavaScript library designed for parsing and manipulating Microsoft Word (.docx) and PowerPoint (.pptx) files. The current stable version is 3.3.0, though the project's major releases have a less frequent cadence, with the latest stable version published two years ago. It supports both Node.js and browser environments. A key differentiator is its performance-oriented parsing strategy: it traverses document content and identifies OpenXML models using a visitor pattern, rather than building and retaining a full in-memory parsed structure. This approach aims for lower memory consumption, making it suitable for environments where memory is a concern. Users can define custom handlers to extract specific content, styles, or attributes from the document, allowing for flexible data extraction tailored to application needs. While initially focused on DOCX, it gained PPTX support in version 3.1.30. It primarily serves use cases requiring content extraction, transformation, or minor modification of Office OpenXML documents.","status":"maintenance","version":"3.3.0","language":"javascript","source_language":"en","source_url":"https://github.com/lalalic/docx4js","tags":["javascript","docx","pptx","zip docx","parser"],"install":[{"cmd":"npm install docx4js","lang":"bash","label":"npm"},{"cmd":"yarn add docx4js","lang":"bash","label":"yarn"},{"cmd":"pnpm add docx4js","lang":"bash","label":"pnpm"}],"dependencies":[{"reason":"Handles the underlying ZIP archive structure of DOCX/PPTX files, essential for reading and writing document components.","package":"jszip"},{"reason":"Provides DOM parsing capabilities for the XML content within the DOCX/PPTX archives in Node.js environments.","package":"xmldom"}],"imports":[{"note":"The library primarily uses a default export for its main API, optimized for ESM. While a `require` call for the main entry point might work in some CJS setups, direct named imports are preferred in newer Node.js versions.","wrong":"const docx4js = require('docx4js');","symbol":"docx4js","correct":"import docx4js from 'docx4js';"},{"note":"Specific internal modules often export a default class or function. Using `require('...').default` is necessary for CJS when the module exports a default.","wrong":"import { ModelHandler } from 'docx4js/lib/openxml/docx/model-handler';","symbol":"ModelHandler","correct":"import ModelHandler from 'docx4js/lib/openxml/docx/model-handler';"},{"note":"`load` is a method directly on the default exported `docx4js` object, not a named export from the root.","wrong":"import { load } from 'docx4js';","symbol":"load","correct":"docx4js.load(fileOrBlob);"}],"quickstart":{"code":"import docx4js from 'docx4js';\nimport { promises as fs } from 'fs';\nimport path from 'path';\n\nasync function processDocxFile(filePath) {\n  try {\n    // For Node.js, read the file buffer\n    const fileBuffer = await fs.readFile(filePath);\n\n    const docx = await docx4js.load(fileBuffer);\n\n    console.log(`Successfully loaded DOCX file: ${filePath}`);\n\n    // Example 1: Render the document to a simple JSON-like structure\n    const renderedContent = docx.render(function createElement(type, props, children) {\n      return { type, props, children };\n    });\n    console.log('Rendered Content (first 5 children):', JSON.stringify(renderedContent.children.slice(0, 5), null, 2));\n\n    // Example 2: Parse document using a custom event handler to extract text\n    let extractedText = '';\n    class MyModelHandler {\n      onp({ children }) { // on paragraph\n        extractedText += (children || []).map(child => child.getText()).join('');\n        extractedText += '\\n'; // Add newline for paragraphs\n      }\n      onr({ children }) { // on run\n        extractedText += (children || []).map(child => child.getText()).join('');\n      }\n      ontext({ content }) {\n        extractedText += content;\n      }\n      // Catch-all for other models if needed\n      on(type, handler) {\n        // A simple way to register handlers dynamically or catch all\n        if (type === '*') console.log(`Found model: ${type}`);\n      }\n    }\n    \n    const handler = new MyModelHandler();\n    docx.parse(handler);\n    console.log('\\nExtracted Text (first 500 chars):\\n', extractedText.substring(0, 500));\n\n    // Example 3: Create a blank document and save (Node.js only)\n    // const newDocx = await docx4js.create();\n    // const newFilePath = path.join(__dirname, 'new_document.docx');\n    // await newDocx.save(newFilePath);\n    // console.log(`Created a new blank DOCX file: ${newFilePath}`);\n\n  } catch (error) {\n    console.error('Error processing DOCX file:', error);\n  }\n}\n\n// To run this, you would need a sample.docx file in the same directory\n// For a real application, replace 'sample.docx' with your actual file path\nconst sampleDocxPath = path.join(process.cwd(), 'sample.docx');\nprocessDocxFile(sampleDocxPath);\n","lang":"javascript","description":"This quickstart demonstrates loading a DOCX file from disk (Node.js), rendering its basic structure, and extracting plain text content using a custom model handler. It illustrates the core `load`, `render`, and `parse` APIs."},"warnings":[{"fix":"Thoroughly review the release notes and migration guides for each major version jump. Expect a complete rewrite of integration code when upgrading between v1, v2, and v3.","message":"Major versions (v1, v2, v3) of docx4js are completely different from each other, indicating that upgrading between major versions will require significant code changes as the API is not backward compatible.","severity":"breaking","affected_versions":">=1.0.0"},{"fix":"Design your application to utilize the visitor pattern (`docx.parse(handler)`) or a rendering function (`docx.render(createElement)`) to process content. Do not expect to modify a traditional document object model after initial parsing. Modifications typically involve creating a new document or using specific save features.","message":"docx4js's parsing approach focuses on traversal without keeping a full, mutable in-memory parsed structure. This means direct manipulation of a DOM-like object is not the primary pattern; instead, users interact via visitor-like handlers.","severity":"gotcha","affected_versions":">=3.0.0"},{"fix":"Verify that your specific Office OpenXML file type (.docx or .pptx) is supported by your docx4js version. Do not assume support for Excel (.xlsx) files.","message":"The library's original goal included DOCX, PPTX, and XLSX support, but it was limited to DOCX for a long time. PPTX support was added in version 3.1.30, but XLSX is not currently supported.","severity":"gotcha","affected_versions":">=3.0.0"},{"fix":"For browser usage, ensure that file content is provided as a `Blob` or `ArrayBuffer` obtained through client-side file APIs (e.g., `FileReader`). Avoid Node.js-specific `fs` imports.","message":"When using docx4js in a browser environment, direct file system access (e.g., `fs` module) is not available. The `load` method expects a `Blob` or `ArrayBuffer` from user input (e.g., file input element).","severity":"gotcha","affected_versions":">=3.0.0"}],"env_vars":null,"last_verified":"2026-04-19T00:00:00.000Z","next_check":"2026-07-18T00:00:00.000Z","problems":[{"fix":"Ensure your build process (Webpack, Rollup, etc.) correctly shims or excludes Node.js modules for browser builds. When `docx4js.load()` is called, pass a `Blob` or `ArrayBuffer` in the browser instead of a file path.","cause":"Attempting to use docx4js in a browser environment without correctly handling Node.js-specific module imports, particularly the `fs` module which is used for file system operations.","error":"Module not found: Error: Can't resolve 'fs' in './node_modules/docx4js/lib'"},{"fix":"Validate the input file (ensure it exists, is a valid .docx/.pptx, and is not empty) before passing it to `docx4js.load()`. Check the file buffer or blob content for integrity.","cause":"This error often occurs when `docx4js.load()` receives an invalid or empty input, such as a corrupt, non-existent, or incorrectly formatted DOCX file, leading to an attempt to access properties of `undefined`.","error":"TypeError: Cannot convert undefined or null to object"},{"fix":"Ensure your runtime environment provides the `URL` global object. In Node.js, this usually means using a recent version. For browser environments, ensure your build setup includes necessary polyfills if targeting older browsers.","cause":"This error can occur in some JavaScript environments (e.g., older Node.js versions or specific bundler configurations for the browser) where the global `URL` constructor is not available or polyfilled, which `docx4js` might use internally for resource handling.","error":"ReferenceError: URL is not defined"}],"ecosystem":"npm","meta_description":null,"install_score":null,"install_tag":null,"quickstart_score":null,"quickstart_tag":null,"pypi_latest":null,"cli_name":""}