Fast HTML Parser
Fast HTML Parser (version 1.0.1) is an HTML parsing library designed for high performance and low-cost processing of large HTML files, generating a simplified DOM tree with basic element query support. It prioritizes speed, often outperforming alternatives like older versions of `htmlparser2` in its benchmarks. Key differentiators include its focus on raw parsing speed and a simplified DOM structure. However, this package is effectively abandoned, with its last release over a decade ago. It lacks active maintenance, modern features, and critical security updates. For current projects requiring a fast HTML parser, `node-html-parser` (a separate, actively maintained package that appears to be a spiritual successor or re-implementation) is the recommended alternative, offering similar performance goals with ongoing development and broader feature support.
Common errors
-
TypeError: HTMLParser.parse is not a function
cause Attempting to use a named import (e.g., `import { parse } from 'fast-html-parser';`) in an ESM context when the package is CommonJS-only and exports the `parse` function as a method of its default export.fixUse CommonJS `require` syntax: `const HTMLParser = require('fast-html-parser'); const root = HTMLParser.parse(...)`. -
ReferenceError: require is not defined
cause Trying to use CommonJS `require()` in an ES module (ESM) environment (e.g., a file with `"type": "module"` in `package.json` or `.mjs` extension) without proper CommonJS interoperability.fixThis package is CommonJS-only. For modern ESM environments, consider using `node-html-parser` which offers ESM support, or dynamically import this package using `import('fast-html-parser').then(mod => mod.parse(...))` (though not officially supported/tested for this old package).
Warnings
- breaking The `fast-html-parser` package (v1.0.1) is abandoned and has not been updated in over a decade. It is not recommended for new projects due to potential unpatched bugs, security vulnerabilities, and lack of modern feature support.
- gotcha CSS selector support is limited to `tagName`, `#id`, and `.class` selectors only. Advanced CSS selectors (e.g., attribute selectors, pseudo-classes, direct child combinators) are not supported, which differs significantly from browser-native `querySelector` implementations.
- gotcha The `HTMLElement#querySelectorAll()` method does not behave like standard browser `querySelectorAll()`. It stops searching a sub-tree after finding the *first* match within that sub-tree, potentially returning an incomplete list of matches.
- gotcha The `lowerCaseTagName`, `script`, `style`, and `pre` parsing options are noted to 'hurt performance heavily/slightly'. Enabling these options will negate some of the library's primary performance benefits.
- gotcha This parser is designed for speed and may not correctly parse all malformed HTML. While it handles common errors like missing closing `<li>` or `<td>` tags, highly irregular or broken HTML might lead to incorrect DOM structures.
Install
-
npm install fast-html-parser -
yarn add fast-html-parser -
pnpm add fast-html-parser
Imports
- HTMLParser
import HTMLParser from 'fast-html-parser';
const HTMLParser = require('fast-html-parser'); - parse
import { parse } from 'fast-html-parser';const { parse } = require('fast-html-parser'); // Destructures the 'parse' method from the main export - HTMLElement
// HTMLElement instances are returned by parse and query methods, not directly imported.
Quickstart
const HTMLParser = require('fast-html-parser');
const htmlContent = `
<div id="container">
<h1>Welcome</h1>
<p class="intro">Hello, <span class="name">World</span>!</p>
<ul>
<li>Item 1</li>
<li>Item 2</li>
</ul>
</div>`;
const root = HTMLParser.parse(htmlContent);
console.log('Root structure:\n', root.firstChild.structure);
const container = root.querySelector('#container');
if (container) {
const welcomeHeading = container.querySelector('h1');
console.log('\nWelcome Heading Text:', welcomeHeading ? welcomeHeading.text : 'Not found');
const introParagraph = container.querySelector('.intro');
console.log('Intro Paragraph HTML:', introParagraph ? introParagraph.rawText : 'Not found');
const spanName = introParagraph ? introParagraph.querySelector('.name') : null;
console.log('Span Name Text:', spanName ? spanName.text : 'Not found');
}