Fast Regex-Based DOM Parser
dom-parser is a lightweight, zero-dependency library for parsing HTML and XML documents into a DOM-like structure using regular expressions. It provides a subset of standard DOM API methods like `getElementById`, `getElementsByClassName`, and `getElementsByTagName`, along with common node properties such as `innerHTML` and `textContent`. The current stable version is 1.1.5. Due to its regexp-based parsing, it is notably fast and compact, making it suitable for environments where full-fledged, specification-compliant DOM parsing (like `jsdom` or browser DOMParser) is overkill or too resource-intensive. Its main differentiator is performance and minimal footprint by leveraging regexps, though this approach might have limitations with highly malformed or complex HTML structures compared to state-machine parsers.
Common errors
-
TypeError: dom.getElementById is not a function
cause Attempting to call a DOM method on `dom` directly before parsing, or on an incorrect object.fixEnsure you have correctly called `parseFromString(htmlContent)` and are calling methods on the returned `Dom` object. Example: `const dom = parseFromString(html); const root = dom.getElementById('myId');` -
Property 'textContent' does not exist on type 'Node'
cause TypeScript error indicating that the `Node` type might not explicitly declare `textContent` (though the library's `Node` interface *does* have it, this can happen if types are misaligned or if a different `Node` type is implicitly used).fixVerify that `dom-parser`'s types are correctly installed and configured. Ensure you are importing `Node` from `dom-parser` if you are explicitly typing your variables. If the issue persists, consider type assertion: `(myNode as any).textContent` or more specifically `(myNode as HtmlNode).textContent` if `HtmlNode` type is exported and applicable.
Warnings
- gotcha Due to its RegExp-based parsing approach, `dom-parser` might not always produce a DOM structure identical to what a browser's native DOMParser or a compliant library like `jsdom` would for highly malformed or edge-case HTML. It prioritizes speed and simplicity over full HTML5 specification compliance.
- gotcha The `Node` API provided by `dom-parser` is a subset of the standard browser DOM API. While common methods like `getElementById`, `getElementsByClassName`, and properties like `innerHTML` are present, more advanced features or less common properties/methods of the native DOM (e.g., `querySelector`, event handling, style manipulation) are not implemented.
Install
-
npm install dom-parser -
yarn add dom-parser -
pnpm add dom-parser
Imports
- parseFromString
const { parseFromString } = require('dom-parser');import { parseFromString } from 'dom-parser'; - Dom
import type { Dom } from 'dom-parser'; - Node
import type { Node } from 'dom-parser';
Quickstart
import { parseFromString } from 'dom-parser';
// Simulate reading an HTML file asynchronously
async function simulateReadFile(filePath: string): Promise<string> {
if (filePath === 'htmlToParse.html') {
return `
<div id="rootNode">
<p class="childNodeClass">Hello from child 1</p>
<span class="childNodeClass">Hello from child 2</span>
<a href="#" name="mylink">Link</a>
</div>
<div class="childNodeClass">Another root child</div>
`;
}
return '';
}
async function main() {
const html = await simulateReadFile('htmlToParse.html');
// Getting DOM model
const dom = parseFromString(html);
// Searching Nodes
const rootNode = dom.getElementById('rootNode');
if (rootNode) {
console.log('Found rootNode with id:', rootNode.nodeName);
const childNodes = rootNode.getElementsByClassName('childNodeClass');
console.log('Children with class "childNodeClass":', childNodes.length);
childNodes.forEach(node => console.log(' - Child text:', node.textContent));
const myLink = rootNode.getElementsByName('mylink')[0];
if (myLink) {
console.log('Found link href:', myLink.getAttribute('href'));
}
}
}
main();