micromark Character Utilities
micromark-util-character is a utility package within the micromark ecosystem, providing a collection of pure functions to efficiently check whether a given character code belongs to various predefined groups, such as ASCII alphanumeric, punctuation, or Markdown-specific line endings and spaces. It is currently at version 2.1.1. As part of the larger micromark project, its release cadence is tied to the main project's development. This package is crucial for developers building custom micromark extensions or parsers who need granular control and performant character classification, differentiating itself by offering a specialized, low-level API for fundamental parsing operations rather than general-purpose string manipulation.
Common errors
-
ERR_REQUIRE_ESM
cause Attempting to import an ESM-only package using CommonJS `require()` syntax.fixChange `const { asciiAlpha } = require('micromark-util-character')` to `import { asciiAlpha } from 'micromark-util-character'`. -
TypeError: Cannot read properties of undefined (reading 'charCodeAt')
cause Passing `null` or `undefined` as a character code to a utility function.fixEnsure the input to character utility functions is always a valid number representing a character code. Add checks for `null` or `undefined` if the source of the code is untrustworthy. -
Property 'asciiAlpha' does not exist on type 'typeof import("micromark-util-character")'.cause Attempting to use a non-existent export or incorrect named import for a utility function.fixVerify the exact name of the imported function and ensure it is correctly destructured from the named exports, e.g., `import { asciiAlpha } from 'micromark-util-character'`.
Warnings
- breaking The package transitioned to being ESM-only. CommonJS `require()` statements will no longer work and will result in errors.
- breaking Version 2.1.0 included updates for 'CM 0.31' and added Unicode symbols for attention. While not explicitly detailed as breaking, changes related to character attention and CommonMark versions can subtly alter parsing behavior if custom extensions relied on previous interpretations.
- gotcha The utility functions expect character *codes* (numbers), not character *strings*. Passing a string will result in incorrect behavior or runtime errors.
- gotcha Be mindful of the distinction between 'ASCII' and 'Unicode' prefixed functions. Functions like `asciiAlpha` only check within the ASCII range, while `unicodePunctuation` and `unicodeWhitespace` handle a broader set of characters. Using an ASCII-only check for non-ASCII input will yield false results.
Install
-
npm install micromark-util-character -
yarn add micromark-util-character -
pnpm add micromark-util-character
Imports
- asciiAlpha
const { asciiAlpha } = require('micromark-util-character')import { asciiAlpha } from 'micromark-util-character' - markdownLineEnding
import markdownLineEnding from 'micromark-util-character'
import { markdownLineEnding } from 'micromark-util-character' - unicodeWhitespace
const unicodeWhitespace = require('micromark-util-character').unicodeWhitespaceimport { unicodeWhitespace } from 'micromark-util-character'
Quickstart
import { asciiAlpha, markdownLineEnding, unicodePunctuation, asciiDigit } from 'micromark-util-character';
console.log('--- Character Checks ---');
const charA = 65; // 'A'
const charNewline = 10; // '\n'
const charAmpersand = 38; // '&'
const charDigit = 50; // '2'
const charSpace = 32; // ' '
const charUnicodePunctuation = 8212; // '—' (em dash)
console.log(`Is '${String.fromCharCode(charA)}' an ASCII alpha character? ${asciiAlpha(charA)}`);
console.log(`Is '${String.fromCharCode(charNewline)}' a Markdown line ending? ${markdownLineEnding(charNewline)}`);
console.log(`Is '${String.fromCharCode(charAmpersand)}' a Unicode punctuation character? ${unicodePunctuation(charAmpersand)}`);
console.log(`Is '${String.fromCharCode(charDigit)}' an ASCII digit character? ${asciiDigit(charDigit)}`);
console.log(`Is '${String.fromCharCode(charSpace)}' a Markdown space? ${markdownLineEnding(charSpace)}`);
console.log(`Is '${String.fromCharCode(charUnicodePunctuation)}' a Unicode punctuation character? ${unicodePunctuation(charUnicodePunctuation)}`);
// Demonstrating a common pattern for custom parsers:
function isStartOfWord(code) {
return asciiAlpha(code) || asciiDigit(code);
}
const testCode = 'H'.charCodeAt(0);
console.log(`Is '${String.fromCharCode(testCode)}' a start of word? ${isStartOfWord(testCode)}`);