NLCST Normalize Utility
nlcst-normalize is a utility within the unified ecosystem, specifically designed for working with Natural Language Concrete Syntax Tree (NLCST) nodes. Its primary function is to serialize and clean words, making them easier to compare consistently. The current stable version is 4.0.0. The package generally follows a release cadence tied to Node.js LTS versions, dropping support for unmaintained Node.js versions with major releases. Key differentiators include its integration with the broader unified collective for natural language processing, its ability to normalize various word forms (e.g., smart vs. straight apostrophes) and handle specific punctuation like hyphens and apostrophes based on configurable options. This makes it particularly useful for tasks such as building keyword matchers or creating indexes where slight variations in word formatting need to be reconciled.
Common errors
-
TypeError: require is not a function
cause Attempting to use `require()` to import nlcst-normalize in a module context where only ESM is supported.fixChange `const { normalize } = require('nlcst-normalize')` to `import { normalize } from 'nlcst-normalize'`. -
Error [ERR_PACKAGE_PATH_NOT_EXPORTED]: Package subpath './lib/index.js' is not defined by "exports" in ...
cause Attempting to import from a non-public, internal path of the package, which is disallowed due to the `exports` field in package.json.fixAlways import symbols directly from the main package entry point: `import { normalize } from 'nlcst-normalize'`. -
Property 'NormalizeOptions' does not exist on type 'typeof import("nlcst-normalize")'cause Using the deprecated `NormalizeOptions` type after upgrading to version 4.0.0 or later.fixReplace `NormalizeOptions` with `Options` in your TypeScript type imports and declarations.
Warnings
- breaking Version 4.0.0 of nlcst-normalize requires Node.js version 16 or higher. Older Node.js versions are no longer supported.
- breaking nlcst-normalize switched to being an ESM-only package starting from version 3.0.0. CommonJS `require()` statements will fail.
- breaking Version 4.0.0 removed the `NormalizeOptions` type. The correct type for configuration is now `Options`.
- gotcha When normalizing, smart apostrophes (`’`) are always converted to straight apostrophes (`'`) and then removed by default, along with hyphens (`-`), unless `allowApostrophes` or `allowDashes` options are explicitly set to `true`.
Install
-
npm install nlcst-normalize -
yarn add nlcst-normalize -
pnpm add nlcst-normalize
Imports
- normalize
const normalize = require('nlcst-normalize').normalizeimport { normalize } from 'nlcst-normalize' - Options
import { Options } from 'nlcst-normalize'import type { Options } from 'nlcst-normalize' - normalize (Deno/Browser)
import {normalize} from 'nlcst-normalize'import {normalize} from 'https://esm.sh/nlcst-normalize@4'
Quickstart
import { normalize } from 'nlcst-normalize';
// Normalize simple strings
console.log(normalize("Don't")); // => 'dont'
console.log(normalize('Don’t')); // => 'dont'
// Normalize with options to retain specific punctuation
console.log(normalize('Don’t', { allowApostrophes: true })); // => 'don\'t'
console.log(normalize('Block-level')); // => 'blocklevel'
console.log(normalize('Block-level', { allowDashes: true })); // => 'block-level'
// Normalize an NLCST WordNode object
const wordNode = {
type: 'WordNode',
children: [
{ type: 'TextNode', value: 'Example' },
{ type: 'PunctuationNode', value: '-' },
{ type: 'TextNode', value: 'word' }
]
};
console.log(normalize(wordNode)); // => 'exampleword'