NLCST Normalize Utility

4.0.0 · active · verified Sun Apr 19

nlcst-normalize is a utility within the unified ecosystem, specifically designed for working with Natural Language Concrete Syntax Tree (NLCST) nodes. Its primary function is to serialize and clean words, making them easier to compare consistently. The current stable version is 4.0.0. The package generally follows a release cadence tied to Node.js LTS versions, dropping support for unmaintained Node.js versions with major releases. Key differentiators include its integration with the broader unified collective for natural language processing, its ability to normalize various word forms (e.g., smart vs. straight apostrophes) and handle specific punctuation like hyphens and apostrophes based on configurable options. This makes it particularly useful for tasks such as building keyword matchers or creating indexes where slight variations in word formatting need to be reconciled.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use the `normalize` function with various string inputs, including options to control punctuation stripping, and how to pass an NLCST `WordNode` object for normalization.

import { normalize } from 'nlcst-normalize';

// Normalize simple strings
console.log(normalize("Don't")); // => 'dont'
console.log(normalize('Don’t')); // => 'dont'

// Normalize with options to retain specific punctuation
console.log(normalize('Don’t', { allowApostrophes: true })); // => 'don\'t'
console.log(normalize('Block-level')); // => 'blocklevel'
console.log(normalize('Block-level', { allowDashes: true })); // => 'block-level'

// Normalize an NLCST WordNode object
const wordNode = {
  type: 'WordNode',
  children: [
    { type: 'TextNode', value: 'Example' },
    { type: 'PunctuationNode', value: '-' },
    { type: 'TextNode', value: 'word' }
  ]
};
console.log(normalize(wordNode)); // => 'exampleword'

view raw JSON →