Fast Regex-Based DOM Parser

1.1.5 · active · verified Tue Apr 21

dom-parser is a lightweight, zero-dependency library for parsing HTML and XML documents into a DOM-like structure using regular expressions. It provides a subset of standard DOM API methods like `getElementById`, `getElementsByClassName`, and `getElementsByTagName`, along with common node properties such as `innerHTML` and `textContent`. The current stable version is 1.1.5. Due to its regexp-based parsing, it is notably fast and compact, making it suitable for environments where full-fledged, specification-compliant DOM parsing (like `jsdom` or browser DOMParser) is overkill or too resource-intensive. Its main differentiator is performance and minimal footprint by leveraging regexps, though this approach might have limitations with highly malformed or complex HTML structures compared to state-machine parsers.

Common errors

Warnings

Install

Imports

Quickstart

This example demonstrates parsing an HTML string, finding elements by ID, and then by class name within a specific element, and accessing attributes.

import { parseFromString } from 'dom-parser';

// Simulate reading an HTML file asynchronously
async function simulateReadFile(filePath: string): Promise<string> {
  if (filePath === 'htmlToParse.html') {
    return `
      <div id="rootNode">
        <p class="childNodeClass">Hello from child 1</p>
        <span class="childNodeClass">Hello from child 2</span>
        <a href="#" name="mylink">Link</a>
      </div>
      <div class="childNodeClass">Another root child</div>
    `;
  }
  return '';
}

async function main() {
  const html = await simulateReadFile('htmlToParse.html');

  // Getting DOM model
  const dom = parseFromString(html);

  // Searching Nodes
  const rootNode = dom.getElementById('rootNode');
  if (rootNode) {
    console.log('Found rootNode with id:', rootNode.nodeName);
    const childNodes = rootNode.getElementsByClassName('childNodeClass');
    console.log('Children with class "childNodeClass":', childNodes.length);
    childNodes.forEach(node => console.log(' - Child text:', node.textContent));

    const myLink = rootNode.getElementsByName('mylink')[0];
    if (myLink) {
      console.log('Found link href:', myLink.getAttribute('href'));
    }
  }
}

main();

view raw JSON →