micromark Character Classifier Utility
The `micromark-util-classify-character` package, currently at version 2.0.1, is a focused utility within the `micromark` ecosystem designed to classify individual Unicode character codes. It categorizes characters into three groups: whitespace, punctuation, or neither. This classification is primarily used in `micromark` extensions, particularly for determining the opening and closing behavior of 'attention' sequences such as emphasis and strong formatting, which depend on the character classes surrounding the sequence. As a part of the actively maintained `micromark` project, its release cadence is tied to the broader library. A key differentiator is its low-level, performance-optimized approach to character classification, directly supporting `micromark`'s parsing algorithms. The package is ESM-only, requiring Node.js version 16 or newer, and ships with complete TypeScript type definitions, making it suitable for modern JavaScript and TypeScript environments.
Common errors
-
ReferenceError: require is not defined
cause Attempting to use CommonJS `require` to import an ESM-only package in an ES module context (e.g., in a `.mjs` file or when `"type": "module"` is set in `package.json`).fixChange `const { classifyCharacter } = require('micromark-util-classify-character');` to `import { classifyCharacter } from 'micromark-util-classify-character';` -
ERR_REQUIRE_ESM
cause Attempting to use CommonJS `require` to import `micromark-util-classify-character`, which is an ES module, in a CommonJS context.fixEither convert your consuming module to ESM by adding `"type": "module"` to your `package.json` or changing its extension to `.mjs`, then use `import` syntax. Alternatively, if your project must remain CommonJS, you might need to use a dynamic import: `import('micromark-util-classify-character').then(mod => mod.classifyCharacter(...))`.
Warnings
- breaking This package is ESM-only. Attempting to import it using CommonJS `require()` will result in an error.
- breaking `micromark-util-classify-character@2` requires Node.js version 16 or higher.
- gotcha This utility is part of the `micromark` ecosystem and is specifically designed to work with `micromark@3`. While it might function with other versions, compatibility is only guaranteed for `micromark@3`.
Install
-
npm install micromark-util-classify-character -
yarn add micromark-util-classify-character -
pnpm add micromark-util-classify-character
Imports
- classifyCharacter
const { classifyCharacter } = require('micromark-util-classify-character');import { classifyCharacter } from 'micromark-util-classify-character'; - characterGroupWhitespace
const { characterGroupWhitespace } = require('micromark-util-constants');import { characterGroupWhitespace } from 'micromark-util-constants'; - Code
import type { Code } from 'micromark-util-types';
Quickstart
import { classifyCharacter } from 'micromark-util-classify-character';
import { characterGroupWhitespace, characterGroupPunctuation } from 'micromark-util-constants';
/**
* Classify a given character and return its category as a string.
* @param {string | null} char The character to classify, or null for EOF.
* @returns {string} The classification result.
*/
function getCharacterCategory(char: string | null): string {
const code = char === null ? null : char.charCodeAt(0);
const classification = classifyCharacter(code);
switch (classification) {
case characterGroupWhitespace:
return `'${char}' (code: ${code}) is classified as WHITESPACE.`;
case characterGroupPunctuation:
return `'${char}' (code: ${code}) is classified as PUNCTUATION.`;
default:
return `'${char}' (code: ${code}) is classified as NEITHER.`;
}
}
console.log(getCharacterCategory(' ')); // Space character
console.log(getCharacterCategory('\t')); // Tab character
console.log(getCharacterCategory('.')); // Period character
console.log(getCharacterCategory('!')); // Exclamation mark
console.log(getCharacterCategory('a')); // Letter 'a'
console.log(getCharacterCategory('7')); // Digit '7'
console.log(getCharacterCategory(null)); // End of file (EOF) is treated as whitespace