HAST Plain-Text Extraction

4.0.2 · active · verified Sun Apr 19

hast-util-to-text is a utility for the unified ecosystem that extracts the plain-text value from a HAST (HTML Abstract Syntax Tree) node. It approximates the DOM's `Node#innerText` algorithm, which is more user-friendly than `Node#textContent` (like `hast-util-to-string`) by converting `<br>` elements into line breaks and using tabs (`\t`) between table cells. The package is currently at version 4.0.2, actively maintained, and primarily releases patch versions for fixes and minor updates for new features, with major versions reserved for breaking changes. Its key differentiator is its adherence to the `innerText`-like behavior, providing a textual representation that reflects how content would be visually rendered, although it cannot account for dynamic CSS properties like `display: none` or `text-transform`.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use `toText` to convert a HAST tree into a plain-text string, showing how it handles line breaks from `<br>` and tabs in table cells.

import {h} from 'hastscript'
import {toText} from 'hast-util-to-text'

const tree = h('div', [
  h('h1', {hidden: true}, 'Alpha.'),
  h('article', [
    h('p', ['Bravo', h('br'), 'charlie.']), // <br> will become a newline
    h('p', 'Delta echo \t foxtrot.') // Tab will be preserved
  ]),
  h('table', [
    h('tr', [
      h('td', 'Cell 1'),
      h('td', 'Cell 2')
    ])
  ])
])

console.log(toText(tree));
// Expected output:
// Bravo
// charlie.
//
// Delta echo    foxtrot.
// Cell 1    Cell 2

view raw JSON →