{"id":16169,"library":"parse5-sax-parser","title":"parse5-sax-parser","description":"parse5-sax-parser is a streaming SAX-style HTML parser, designed for efficient, event-driven processing of HTML documents without building a full Document Object Model (DOM). It is part of the comprehensive `parse5` toolset, known for its high conformance to the WHATWG HTML Living Standard. The current stable version is 8.0.1. The project maintains an active release cadence, with major versions (like v7.0.0 and v8.0.0) introducing significant architectural changes and features, complemented by frequent patch and minor releases for dependency updates and bug fixes. Its key differentiators include its streaming nature, SAX (Simple API for XML) event model, and robust HTML5 spec compliance, making it suitable for scenarios where memory efficiency and raw content inspection are prioritized over DOM manipulation. It's often used in conjunction with other `parse5` modules or as a standalone component for tasks like data extraction or sanitization.","status":"active","version":"8.0.0","language":"javascript","source_language":"en","source_url":"git://github.com/inikulin/parse5","tags":["javascript","parse5","parser","stream","streaming","SAX","typescript"],"install":[{"cmd":"npm install parse5-sax-parser","lang":"bash","label":"npm"},{"cmd":"yarn add parse5-sax-parser","lang":"bash","label":"yarn"},{"cmd":"pnpm add parse5-sax-parser","lang":"bash","label":"pnpm"}],"dependencies":[{"reason":"Core HTML parsing logic and utilities are provided by the main parse5 package.","package":"parse5","optional":false},{"reason":"Used for HTML entity decoding, a transitive dependency often updated.","package":"entities","optional":false}],"imports":[{"note":"Since v7.0.0, parse5 and its sub-packages are ECMAScript Modules (ESM) first. CommonJS `require()` is generally discouraged or requires specific bundler/Node.js configurations. Always prefer named ESM imports.","wrong":"const SAXParser = require('parse5-sax-parser').SAXParser;","symbol":"SAXParser","correct":"import { SAXParser } from 'parse5-sax-parser';"},{"note":"Import types using `import type` for better tree-shaking and clarity, especially when the library ships its own TypeScript definitions.","wrong":"import { SAXParserOptions } from 'parse5-sax-parser';","symbol":"SAXParserOptions","correct":"import type { SAXParserOptions } from 'parse5-sax-parser';"},{"note":"Specific token types like `StartTag`, `EndTag`, `Text`, `Comment`, and `Doctype` are exported as types directly from the main package entry point for convenience since v7.0.0, consolidating module exports.","wrong":"import { StartTag } from 'parse5-sax-parser/lib/tokens';","symbol":"StartTag","correct":"import type { StartTag } from 'parse5-sax-parser';"}],"quickstart":{"code":"import { SAXParser } from 'parse5-sax-parser';\nimport { Readable } from 'stream';\n\n// Simulate an HTML input stream\nconst htmlStream = new Readable({\n  read() {\n    this.push('<!DOCTYPE html><html><head><title>Test</title></head><body>');\n    this.push('<h1>Hello, <b>world</b>!</h1><p>This is a <a href=\"#\">link</a>.</p>');\n    this.push('<!-- a comment --><br>');\n    this.push('</body></html>');\n    this.push(null); // No more data\n  }\n});\n\nconst parser = new SAXParser();\n\nparser.on('doctype', (doctype) => {\n  console.log('DOCTYPE:', doctype.name);\n});\n\nparser.on('startTag', (tag) => {\n  console.log(`Start Tag: <${tag.name}> Attributes:`, tag.attrs.map(attr => `${attr.name}=\"${attr.value}\"`).join(' '));\n});\n\nparser.on('endTag', (tag) => {\n  console.log(`End Tag: </${tag.name}>`);\n});\n\nparser.on('text', (text) => {\n  if (text.text.trim().length > 0) {\n    console.log('Text:', JSON.stringify(text.text));\n  }\n});\n\nparser.on('comment', (comment) => {\n  console.log('Comment:', comment.text);\n});\n\nparser.on('error', (err) => {\n  console.error('Parsing error:', err);\n});\n\nparser.on('finish', () => {\n  console.log('Parsing finished!');\n});\n\n// Pipe the HTML stream through the parser. SAXParser is a passthrough stream.\n// It emits events but passes the original data unchanged, allowing further piping.\nhtmlStream.pipe(parser);\n// If you wanted to, you could pipe it further, e.g., parser.pipe(anotherWritableStream);\n","lang":"typescript","description":"Demonstrates how to use parse5-sax-parser as a streaming event emitter, piping HTML data through it and listening for SAX-style events like startTag, endTag, text, and comment."},"warnings":[{"fix":"Migrate your project to use native ESM imports (`import ... from '...'`). For Node.js, ensure your package.json specifies `\"type\": \"module\"` or use `.mjs` file extensions. Older Node.js versions or specific bundler configurations might require additional setup.","message":"Starting with v7.0.0, all `parse5` packages, including `parse5-sax-parser`, are published as ECMAScript Modules (ESM) only. Direct CommonJS `require()` statements are no longer supported by default.","severity":"breaking","affected_versions":">=7.0.0"},{"fix":"Remove `@types/parse5-sax-parser` from your `devDependencies` in `package.json` and run `npm install` or `yarn install`.","message":"As of v7.0.0, `parse5` and its sub-packages now ship their own TypeScript definitions. You should remove any `@types/parse5-sax-parser` package from your project as it is no longer needed and can cause type conflicts.","severity":"breaking","affected_versions":">=7.0.0"},{"fix":"Understand its purpose: event-driven analysis without content modification. For transformation, use `parse5-html-rewriting-stream` or a full DOM parser/serializer from `parse5`.","message":"parse5-sax-parser is a pass-through transform stream. It emits events but does *not* modify the HTML content itself. If you pipe data through it, the output will be identical to the input. This means it cannot be used for HTML sanitization or rewriting directly; for that, consider `parse5-html-rewriting-stream` or building a DOM with `parse5` and then serializing.","severity":"gotcha","affected_versions":">=1.0.0"},{"fix":"Thoroughly test your application's HTML parsing behavior after upgrading to v7.0.0 or later to ensure no unexpected changes in token streams or parsing outcomes occur, especially with malformed or complex HTML.","message":"The underlying `parse5` core package, upon which `parse5-sax-parser` relies, received significant updates in v7.0.0 to catch up with the latest HTML Living Standard specification. This might lead to subtle differences in parsing results for certain edge cases compared to previous versions.","severity":"breaking","affected_versions":">=7.0.0"},{"fix":"If using custom tree adapters with `parse5`, ensure they implement the `updateNodeSourceCodeLocation` method. If only using `parse5-sax-parser` for events, this warning is less critical but indicates a change in the underlying ecosystem.","message":"In `parse5` v6.0.0 (and therefore affecting the broader parse5 ecosystem), the `TreeAdapter` interface introduced a new mandatory method, `updateNodeSourceCodeLocation`. While `parse5-sax-parser` does not directly build a DOM tree, applications that heavily integrate custom `TreeAdapter` implementations with the core `parse5` functionality might need to update their adapters if they are also using `parse5-sax-parser` in the same project context.","severity":"breaking","affected_versions":">=6.0.0 <7.0.0"}],"env_vars":null,"last_verified":"2026-04-21T00:00:00.000Z","next_check":"2026-07-20T00:00:00.000Z","problems":[{"fix":"Change `const { SAXParser } = require('parse5-sax-parser');` to `import { SAXParser } from 'parse5-sax-parser';`. Ensure your `package.json` has `\"type\": \"module\"` or use `.mjs` file extensions for ESM files.","cause":"Attempting to use `require()` to import `parse5-sax-parser` in an ECMAScript Module (ESM) context or a Node.js environment configured for ESM.","error":"ReferenceError: require is not defined"},{"fix":"Verify your import statement. For ESM, use `import { SAXParser } from 'parse5-sax-parser';`. For older CommonJS projects (pre-v7), it would have been `const { SAXParser } = require('parse5-sax-parser');`.","cause":"Incorrectly importing `SAXParser` as a default import, or attempting to use a CommonJS `require()` pattern in a project that expects ESM named exports, or vice-versa.","error":"TypeError: SAXParser is not a constructor"},{"fix":"If you are writing data manually using `parser.write(chunk)`, ensure you call `parser.end()` when all data has been written. If using `pipe()`, ensure the source stream correctly signals its end (e.g., by pushing `null` for `Readable` streams).","cause":"Forgetting to signal the end of the input stream to the SAXParser, especially when manually `write()`-ing chunks instead of piping from another stream.","error":"My parser isn't emitting all expected events or seems to hang after processing some input."}],"ecosystem":"npm"}