micromark Character Classifier Utility

2.0.1 · active · verified Sun Apr 19

The `micromark-util-classify-character` package, currently at version 2.0.1, is a focused utility within the `micromark` ecosystem designed to classify individual Unicode character codes. It categorizes characters into three groups: whitespace, punctuation, or neither. This classification is primarily used in `micromark` extensions, particularly for determining the opening and closing behavior of 'attention' sequences such as emphasis and strong formatting, which depend on the character classes surrounding the sequence. As a part of the actively maintained `micromark` project, its release cadence is tied to the broader library. A key differentiator is its low-level, performance-optimized approach to character classification, directly supporting `micromark`'s parsing algorithms. The package is ESM-only, requiring Node.js version 16 or newer, and ships with complete TypeScript type definitions, making it suitable for modern JavaScript and TypeScript environments.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to import and use the `classifyCharacter` function to determine if a character code represents whitespace, punctuation, or neither. It also shows how to use the constants for comparison.

import { classifyCharacter } from 'micromark-util-classify-character';
import { characterGroupWhitespace, characterGroupPunctuation } from 'micromark-util-constants';

/**
 * Classify a given character and return its category as a string.
 * @param {string | null} char The character to classify, or null for EOF.
 * @returns {string} The classification result.
 */
function getCharacterCategory(char: string | null): string {
  const code = char === null ? null : char.charCodeAt(0);
  const classification = classifyCharacter(code);

  switch (classification) {
    case characterGroupWhitespace:
      return `'${char}' (code: ${code}) is classified as WHITESPACE.`;
    case characterGroupPunctuation:
      return `'${char}' (code: ${code}) is classified as PUNCTUATION.`;
    default:
      return `'${char}' (code: ${code}) is classified as NEITHER.`;
  }
}

console.log(getCharacterCategory(' ')); // Space character
console.log(getCharacterCategory('\t')); // Tab character
console.log(getCharacterCategory('.')); // Period character
console.log(getCharacterCategory('!')); // Exclamation mark
console.log(getCharacterCategory('a')); // Letter 'a'
console.log(getCharacterCategory('7')); // Digit '7'
console.log(getCharacterCategory(null)); // End of file (EOF) is treated as whitespace

view raw JSON →