Unicode Codepoint Database Parser

1.3.0 · maintenance · verified Wed Apr 22

The `codepoints` package provides a parser for the Unicode Character Database (UCD) files, producing a large array of JavaScript objects, each representing a Unicode codepoint with extensive properties like name, category, block, script, bidi class, and various casing and decomposition mappings. The current stable version is 1.3.0, and the package is primarily intended for use in build scripts, not directly in production applications, due to its significant memory footprint and unoptimized parsing speed. It bundles a default UCD, but also allows specifying a custom UCD path. For real-world applications requiring Unicode data, the project maintainers recommend using modules that provide precompiled and compressed data, such as `unicode-properties`. It does not follow a strict release cadence and has seen infrequent updates, reflecting its stable but specialized role.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use the `codepoints/parser` submodule to process a Unicode Character Database (UCD) from a custom directory, showcasing its primary use case for build scripts to generate structured Unicode data. It simulates a minimal UCD for demonstration purposes.

const parser = require('codepoints/parser');
const path = require('path');
const fs = require('fs');

// In a real build script, you would download and extract the UCD yourself.
// For this example, we'll simulate a UCD directory.
const mockUCDPath = path.join(__dirname, 'mock-ucd');
if (!fs.existsSync(mockUCDPath)) {
  fs.mkdirSync(mockUCDPath);
  // Simulate a minimal UnicodeData.txt for the parser to find
  fs.writeFileSync(path.join(mockUCDPath, 'UnicodeData.txt'), '0041;LATIN CAPITAL LETTER A;Lu;0;L;;;;;N;;;;0061;;;\n0062;LATIN SMALL LETTER B;Ll;0;L;;;;;N;;;0042;;0042;');
}

// Parse a custom version of the UCD from the specified directory
try {
  const codepointData = parser(mockUCDPath);

  console.log(`Total codepoints parsed: ${codepointData.length}`);
  console.log('Data for LATIN CAPITAL LETTER A (U+0041):', {
    code: codepointData[0x41].code,
    name: codepointData[0x41].name,
    category: codepointData[0x41].category,
    lowercase: codepointData[0x41].lowercase
  });

  console.log('Data for LATIN SMALL LETTER B (U+0062):', {
    code: codepointData[0x62].code,
    name: codepointData[0x62].name,
    category: codepointData[0x62].category,
    uppercase: codepointData[0x62].uppercase
  });
} catch (error) {
  console.error('Error parsing UCD:', error.message);
  console.log('Ensure that the mock-ucd directory contains necessary UCD files, e.g., UnicodeData.txt');
}

// Clean up mock UCD directory
fs.rmSync(mockUCDPath, { recursive: true, force: true });

view raw JSON →