HAST to NLCST Transformer
hast-util-to-nlcst is a utility package within the unified/syntax-tree ecosystem designed to transform a HAST (HTML Abstract Syntax Tree) into an NLCST (Natural Language Concrete Syntax Tree). This transformation extracts the natural language content from an HTML structure, making it suitable for natural language processing tasks such as linting, sentiment analysis, or spell checking with tools like retext. The package is currently stable at version 4.0.0 and follows a semver release cadence, with major versions often introducing breaking changes related to environment support (e.g., Node.js versions, ESM-only) or parser API updates. A key differentiator is its focused role in bridging HTML content to natural language processing within the unist AST family, though it currently lacks a mechanism to apply changes back from NLCST to HAST. It is often used in conjunction with parsers like `parse-english` and wrappers like `rehype-retext`.
Common errors
-
ERR_REQUIRE_ESM
cause Attempting to use `require()` to import `hast-util-to-nlcst`, which is an ESM-only package.fixChange `const { toNlcst } = require('hast-util-to-nlcst')` to `import { toNlcst } from 'hast-util-to-nlcst'`. Ensure your project runs in an ESM context (e.g., `type: "module"` in `package.json` or `.mjs` file extension). -
SyntaxError: Cannot use import statement outside a module
cause You are trying to use an `import` statement in a Node.js environment that is not configured for ES modules, or an older Node.js version.fixEnsure your Node.js version is 16+ (required by v4.0.0), and add `"type": "module"` to your `package.json` file. Alternatively, rename your file to use the `.mjs` extension. -
TypeError: Parser is not a constructor
cause The NLCST parser passed to `toNlcst` is either an outdated version or incorrectly imported/instantiated, especially after the v3.0.0 breaking change.fixUpdate your NLCST parser package (e.g., `npm install parse-english@latest`). Ensure you are passing the constructor function itself (e.g., `ParseEnglish`) and not an instance or an incorrect export. -
TypeError: Cannot read properties of undefined (reading 'position') or incorrect NLCST output with missing text.
cause The input HAST `tree` lacks positional information, which `hast-util-to-nlcst` relies on for accurate conversion.fixEnsure the utility or parser you use to create the HAST tree preserves positional data. For instance, when using `hast-util-from-html`, ensure you pass a `VFile` object created from actual content, which typically retains this info.
Warnings
- breaking Version 4.0.0 introduces a requirement for Node.js version 16 or higher. Running on older Node.js environments will lead to runtime errors or module resolution failures.
- breaking Version 4.0.0 changed to use the `exports` field in `package.json`, which affects module resolution behavior, especially in some bundlers or older Node.js versions. This may require adjustments to build configurations.
- breaking Version 3.0.0 introduced breaking changes related to the NLCST parsers. The `Parser` argument passed to `toNlcst` must be updated to the latest compatible version (e.g., `parse-latin`, `parse-english`, `parse-dutch`).
- breaking Version 2.0.0 switched the package to be ESM-only (ECMAScript Modules). CommonJS `require()` statements are no longer supported and will result in module loading errors.
- gotcha The `toNlcst` function requires the input HAST `tree` to have positional information (line, column, offset data) for accurate NLCST conversion. The `VFile` passed must also correspond directly to the `tree`.
Install
-
npm install hast-util-to-nlcst -
yarn add hast-util-to-nlcst -
pnpm add hast-util-to-nlcst
Imports
- toNlcst
const toNlcst = require('hast-util-to-nlcst')import { toNlcst } from 'hast-util-to-nlcst' - ParserConstructor
import type { ParserConstructor } from 'hast-util-to-nlcst' - ParserInstance
import type { ParserInstance } from 'hast-util-to-nlcst'
Quickstart
import { fromHtml } from 'hast-util-from-html';
import { toNlcst } from 'hast-util-to-nlcst';
import { ParseEnglish } from 'parse-english';
import { readSync } from 'to-vfile';
import { inspect } from 'unist-util-inspect';
import * as fs from 'fs';
import * as path from 'path';
// Create a dummy HTML file for the example
const exampleHtmlContent = `
<article>
Implicit.
<h1>Explicit: <strong>foo</strong>s-ball</h1>
<pre><code class="language-foo">bar()</code></pre>
</article>
`;
const exampleHtmlPath = path.join(process.cwd(), 'example.html');
fs.writeFileSync(exampleHtmlPath, exampleHtmlContent);
// Read the virtual file
const file = readSync(exampleHtmlPath);
// Parse HTML string to HAST
const tree = fromHtml(file);
// Transform HAST to NLCST using ParseEnglish
const nlcstTree = toNlcst(tree, file, ParseEnglish);
// Log the inspected NLCST tree (positional info removed for brevity)
console.log(inspect(nlcstTree));
// Clean up the dummy file
fs.unlinkSync(exampleHtmlPath);