{"id":13304,"library":"html5parser","title":"HTML5 Parser","description":"html5parser is a highly performant and compact JavaScript library designed for parsing HTML5 documents. Currently at version 2.0.2, it provides functionalities to tokenize and parse HTML strings into a structured Abstract Syntax Tree (AST). It distinguishes itself through its speed, tiny bundle size (under 5kb), and cross-platform compatibility, running effectively in both modern browsers and Node.js environments. A key design principle is its strict adherence to the HTML5 specification; any content not compliant with HTML5 will be ignored. The library offers both low-level tokenization (`tokenize`) and higher-level AST generation (`parse`), along with utilities like `walk` for traversing the AST and `safeHtml` for sanitization. It ships with TypeScript types, promoting better developer experience and type safety. The release cadence appears to be driven by dependency updates and bug fixes, with `2.0.2` specifically noting dependency updates, suggesting a focus on stability and maintenance.","status":"active","version":"2.0.2","language":"javascript","source_language":"en","source_url":"https://github.com/acrazing/html5parser","tags":["javascript","html5","parser","ast","attributes","typescript"],"install":[{"cmd":"npm install html5parser","lang":"bash","label":"npm"},{"cmd":"yarn add html5parser","lang":"bash","label":"yarn"},{"cmd":"pnpm add html5parser","lang":"bash","label":"pnpm"}],"dependencies":[],"imports":[{"note":"Primary function for parsing HTML strings into an AST. Incorrect CommonJS import for named exports.","wrong":"const { parse } = require('html5parser');","symbol":"parse","correct":"import { parse } from 'html5parser';"},{"note":"Utility function for traversing the generated AST. Incorrect CommonJS import for named exports.","wrong":"const walk = require('html5parser').walk;","symbol":"walk","correct":"import { walk } from 'html5parser';"},{"note":"An enum containing constants for different AST node types, useful for type checking during traversal. Use `import` for value access, `import type` for type-only use.","wrong":"import type { SyntaxKind } from 'html5parser'; // if used as a value","symbol":"SyntaxKind","correct":"import { SyntaxKind } from 'html5parser';"},{"note":"Low-level API for parsing HTML strings into a token stream. Incorrect CommonJS import for named exports.","wrong":"const tokenize = require('html5parser').tokenize;","symbol":"tokenize","correct":"import { tokenize } from 'html5parser';"},{"note":"Utility for sanitizing HTML input to prevent XSS attacks, based on HTML5 spec. Incorrect CommonJS import for named exports.","wrong":"const safeHtml = require('html5parser').safeHtml;","symbol":"safeHtml","correct":"import { safeHtml } from 'html5parser';"}],"quickstart":{"code":"import { parse, walk, SyntaxKind } from 'html5parser';\n\nconst htmlInput = '<!DOCTYPE html><html><head><title>Hello html5parser!</title></head><body><h1>Welcome</h1><p>This is a test.</p></body></html>';\nconst ast = parse(htmlInput);\n\nconsole.log('Parsed AST length:', ast.length);\n\nwalk(ast, {\n  enter: (node) => {\n    if (node.type === SyntaxKind.Tag && node.name === 'title' && Array.isArray(node.body)) {\n      const textNode = node.body[0];\n      if (textNode && textNode.type === SyntaxKind.Text) {\n        // Example of browser-specific DOM manipulation\n        if (typeof document !== 'undefined') {\n          const div = document.createElement('div');\n          div.innerHTML = `The title of the input is <strong>${textNode.value}</strong>`;\n          document.body.appendChild(div);\n        } else {\n          console.log(`[Node.js] The title of the input is \"${textNode.value}\"`);\n        }\n      }\n    }\n    if (node.type === SyntaxKind.Tag && node.name === 'p') {\n      console.log('Found a paragraph tag:', node.body && node.body[0] && node.body[0].value);\n    }\n  },\n  exit: (node) => {\n    // console.log(`Exiting node: ${SyntaxKind[node.type]}${node.type === SyntaxKind.Tag ? ` <${node.name}>` : ''}`);\n  }\n});","lang":"typescript","description":"Demonstrates parsing an HTML string into an Abstract Syntax Tree (AST) and then traversing it using `walk` to extract the content of the `<title>` tag and other elements. It includes a browser-specific DOM update with a Node.js console fallback."},"warnings":[{"fix":"Ensure that all input HTML conforms to the official HTML5 specification. For parsing non-standard, malformed, or highly custom HTML, consider alternative parsers that offer more lenient parsing options.","message":"The parser strictly adheres to the HTML5 specification. Any HTML structures, tags, or attributes not explicitly defined in the HTML5 specification will be ignored during parsing, potentially leading to unexpected AST structures or missing content for non-standard markup.","severity":"gotcha","affected_versions":">=2.0.0"}],"env_vars":null,"last_verified":"2026-04-19T00:00:00.000Z","next_check":"2026-07-18T00:00:00.000Z","problems":[{"fix":"Use ES Module import syntax: `import { parse } from 'html5parser';`","cause":"Attempting to import `parse` using CommonJS `require` syntax when the package is primarily designed for ES Modules or when destructuring is incorrect.","error":"TypeError: parse is not a function"},{"fix":"Ensure code interacting with the `document` object is executed in a browser environment, or use a library like JSDOM to simulate a browser DOM in Node.js for testing or server-side rendering.","cause":"Attempting to run browser-specific DOM manipulation code (e.g., `document.createElement`) in a Node.js environment without a DOM shim.","error":"ReferenceError: document is not defined"}],"ecosystem":"npm","meta_description":null,"install_score":null,"install_tag":null,"quickstart_score":null,"quickstart_tag":null,"pypi_latest":null,"cli_name":"","cli_version":null}