HAST to NLCST Transformer

4.0.0 · active · verified Sun Apr 19

hast-util-to-nlcst is a utility package within the unified/syntax-tree ecosystem designed to transform a HAST (HTML Abstract Syntax Tree) into an NLCST (Natural Language Concrete Syntax Tree). This transformation extracts the natural language content from an HTML structure, making it suitable for natural language processing tasks such as linting, sentiment analysis, or spell checking with tools like retext. The package is currently stable at version 4.0.0 and follows a semver release cadence, with major versions often introducing breaking changes related to environment support (e.g., Node.js versions, ESM-only) or parser API updates. A key differentiator is its focused role in bridging HTML content to natural language processing within the unist AST family, though it currently lacks a mechanism to apply changes back from NLCST to HAST. It is often used in conjunction with parsers like `parse-english` and wrappers like `rehype-retext`.

Common errors

Warnings

Install

Imports

Quickstart

This example demonstrates how to parse an HTML string into a HAST tree, then convert that HAST tree into an NLCST tree using `hast-util-to-nlcst` with `ParseEnglish`. It then uses `unist-util-inspect` to log the resulting natural language tree structure, showing how text content from HTML elements is represented.

import { fromHtml } from 'hast-util-from-html';
import { toNlcst } from 'hast-util-to-nlcst';
import { ParseEnglish } from 'parse-english';
import { readSync } from 'to-vfile';
import { inspect } from 'unist-util-inspect';
import * as fs from 'fs';
import * as path from 'path';

// Create a dummy HTML file for the example
const exampleHtmlContent = `
<article>
  Implicit.
  <h1>Explicit: <strong>foo</strong>s-ball</h1>
  <pre><code class="language-foo">bar()</code></pre>
</article>
`;
const exampleHtmlPath = path.join(process.cwd(), 'example.html');
fs.writeFileSync(exampleHtmlPath, exampleHtmlContent);

// Read the virtual file
const file = readSync(exampleHtmlPath);
// Parse HTML string to HAST
const tree = fromHtml(file);

// Transform HAST to NLCST using ParseEnglish
const nlcstTree = toNlcst(tree, file, ParseEnglish);

// Log the inspected NLCST tree (positional info removed for brevity)
console.log(inspect(nlcstTree));

// Clean up the dummy file
fs.unlinkSync(exampleHtmlPath);

view raw JSON →