{"id":13307,"library":"htmlparser2","title":"Fast & Forgiving HTML/XML Parser","description":"htmlparser2 is a high-performance, event-driven HTML/XML parser for JavaScript and TypeScript environments. It is currently at stable version 12.0.0 and maintains an active release cadence with frequent updates, often aligning with WHATWG specifications. The library prioritizes speed and efficiency, making it suitable for tasks like web scraping, content transformation, and processing RSS/Atom feeds. While fast and forgiving, it takes some shortcuts compared to strictly spec-compliant parsers like `parse5`, which might lead to different parsing results for highly malformed HTML. It integrates with an ecosystem of related packages like `domhandler` for DOM construction and `css-select` for querying.","status":"active","version":"12.0.0","language":"javascript","source_language":"en","source_url":"git://github.com/fb55/htmlparser2","tags":["javascript","html","parser","streams","xml","dom","rss","feed","atom","typescript"],"install":[{"cmd":"npm install htmlparser2","lang":"bash","label":"npm"},{"cmd":"yarn add htmlparser2","lang":"bash","label":"yarn"},{"cmd":"pnpm add htmlparser2","lang":"bash","label":"pnpm"}],"dependencies":[{"reason":"Required for building a DOM tree from parsed HTML/XML, commonly used with htmlparser2.","package":"domhandler","optional":false},{"reason":"Provides utility functions for manipulating the DOM created by domhandler.","package":"domutils","optional":false},{"reason":"Defines the types of DOM elements, used internally by htmlparser2 and domhandler.","package":"domelementtype","optional":false},{"reason":"Used for decoding and encoding HTML entities, integral to the parsing process.","package":"entities","optional":false}],"imports":[{"note":"Since v11.0.0, htmlparser2 is an ESM-only module. CommonJS `require()` is no longer supported.","wrong":"const Parser = require('htmlparser2').Parser;","symbol":"Parser","correct":"import { Parser } from 'htmlparser2';"},{"note":"A convenience function for parsing a document and returning a DOM structure using `domhandler`.","wrong":"const parseDocument = require('htmlparser2').parseDocument;","symbol":"parseDocument","correct":"import { parseDocument } from 'htmlparser2';"},{"note":"Introduced in v11.0.0, this enables direct piping from Web Streams API responses.","wrong":"const WebWritableStream = require('htmlparser2').WebWritableStream;","symbol":"WebWritableStream","correct":"import { WebWritableStream } from 'htmlparser2';"}],"quickstart":{"code":"import { Parser } from 'htmlparser2';\n\nconst htmlContent = \"Xyz <script type='text/javascript'>const foo = '<<bar>>';</script><p>Hello, World!</p>\";\n\nconst parser = new Parser({\n    onopentag(name, attributes) {\n        console.log(`Opened tag: ${name}`);\n        if (name === \"script\" && attributes.type === \"text/javascript\") {\n            console.log(\"JavaScript block detected!\");\n        }\n    },\n    ontext(text) {\n        // Note: This can fire at any point within text and you might\n        // have to stitch together multiple pieces if not using a DOM handler.\n        const trimmedText = text.trim();\n        if (trimmedText.length > 0) {\n          console.log(`--> Text content: '${trimmedText}'`);\n        }\n    },\n    onclosetag(tagname) {\n        console.log(`Closed tag: ${tagname}`);\n        if (tagname === \"script\") {\n            console.log(\"Script block finished.\");\n        }\n    },\n    onerror(error) {\n      console.error(\"Parsing error:\", error);\n    }\n});\n\nparser.write(htmlContent);\nparser.end();\n\n// To get a DOM, you'd typically use parseDocument with domhandler:\n// import { parseDocument } from 'htmlparser2';\n// const dom = parseDocument('<div id=\"root\"><span>Hi</span></div>');\n// console.log(dom[0].children[0].data); // 'Hi'","lang":"typescript","description":"This quickstart demonstrates the event-driven parsing capabilities of htmlparser2 by creating a `Parser` instance and feeding it HTML content. It logs opening tags, text content, and closing tags, illustrating the callback interface."},"warnings":[{"fix":"Update your project to use ES Modules (e.g., `\"type\": \"module\"` in `package.json`) and replace all `require('htmlparser2')` with `import { ... } from 'htmlparser2';`.","message":"As of v11.0.0, htmlparser2 is an ESM-only module. All CommonJS `require()` statements will fail, and you must migrate to ES Modules `import` syntax.","severity":"breaking","affected_versions":">=11.0.0"},{"fix":"Ensure your project's Node.js environment is updated to version 20.19.0 or newer.","message":"Version 11.0.0 raises the minimum Node.js version requirement to 20.19.0. Older Node.js environments will not be supported.","severity":"breaking","affected_versions":">=11.0.0"},{"fix":"Review existing code that processes content within these tags, as the parsing behavior for their children will have changed. Content previously parsed as HTML will now be treated as raw text.","message":"Version 12.0.0 aligns HTML parsing with the WHATWG specification, particularly for raw-text and RCDATA tags such as `<iframe>`, `<noembed>`, `<noframes>`, `<plaintext>`, and `<textarea>`. Their content is no longer parsed as HTML, and entities in `<textarea>` are now decoded.","severity":"breaking","affected_versions":">=12.0.0"},{"fix":"Test parsing of HTML with complex or malformed entities in attributes to ensure the new behavior does not negatively impact your application's logic. Adjust expectations for attribute values as necessary.","message":"In v9.0.0, the tokenizer's entity parsing behavior changed to align with the HTML spec, specifically for entities within attributes. This can lead to different attribute values for certain malformed inputs (e.g., `<a href='&amp=boo'>`).","severity":"breaking","affected_versions":">=9.0.0"},{"fix":"Migrate your feed parsing logic. The documentation or past changelogs for v8.0.0 should provide guidance on how to replace `FeedHandler` functionality, typically by using a generic handler with `parseDocument` and `domutils`.","message":"The `FeedHandler` class, previously deprecated, has been completely removed in v8.0.0. Code relying on this class will break.","severity":"breaking","affected_versions":">=8.0.0"},{"fix":"Upgrade your project's TypeScript dependency to version 4.5 or newer.","message":"Version 8.0.0 requires TypeScript >= 4.5. Projects using older TypeScript versions will encounter compilation errors.","severity":"breaking","affected_versions":">=8.0.0"},{"fix":"If strict HTML compliance is critical, evaluate if htmlparser2's parsing behavior meets your requirements, especially with highly malformed or unusual HTML inputs. Consider using `parse5` if strictness is paramount.","message":"htmlparser2 is optimized for speed and may take shortcuts, meaning it is not strictly HTML spec compliant in all edge cases. For applications requiring strict spec adherence, `parse5` might be a more suitable alternative.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-19T00:00:00.000Z","next_check":"2026-07-18T00:00:00.000Z","problems":[{"fix":"Change `const htmlparser2 = require('htmlparser2');` to `import * as htmlparser2 from 'htmlparser2';` or `import { Parser } from 'htmlparser2';`. Ensure your `package.json` has `\"type\": \"module\"` if running in Node.js.","cause":"Attempting to use CommonJS `require()` syntax with htmlparser2 v11.0.0 or later in an ES Modules environment.","error":"SyntaxError: require is not defined"},{"fix":"Refactor your code to no longer use `FeedHandler`. Instead, use the `Parser` class with custom handlers or the `parseDocument` function along with `domutils` and `domhandler` to process feeds.","cause":"Attempting to instantiate the `FeedHandler` class, which was removed in htmlparser2 v8.0.0.","error":"TypeError: htmlparser2.FeedHandler is not a constructor"},{"fix":"Ensure `htmlparser2` is installed. If using an older TypeScript version (<4.5), upgrade it as v8.0.0+ requires TS >= 4.5. Verify your `tsconfig.json` `moduleResolution` is set appropriately for ESM (e.g., `\"node16\"` or `\"bundler\"`).","cause":"TypeScript compiler cannot locate the module or its types, possibly due to incorrect import paths, missing `@types/htmlparser2` (though it ships types), or an outdated TypeScript version.","error":"TS2307: Cannot find module 'htmlparser2' or its corresponding type declarations."},{"fix":"For very large documents, consider using the event-driven `Parser` directly with custom handlers to process chunks incrementally, rather than building a complete DOM tree with `parseDocument`. Increase Node.js stack size (`--stack-size=N`) as a temporary measure if acceptable.","cause":"This can occur with extremely large or deeply nested HTML structures due to recursive parsing, especially when building a DOM directly without streaming.","error":"RangeError: Maximum call stack size exceeded"}],"ecosystem":"npm","meta_description":null,"install_score":null,"install_tag":null,"quickstart_score":null,"quickstart_tag":null,"pypi_latest":null,"cli_name":"","cli_version":null}