Grapheme Splitter

1.0.4 · active · verified Sun Apr 19

grapheme-splitter is a JavaScript library designed to accurately segment strings into user-perceived characters, known as extended grapheme clusters, as defined by Unicode Standard Annex #29 (UAX #29) Default Grapheme Cluster Boundaries. It addresses fundamental issues in JavaScript's native string handling, where `String.length` and simple character iteration can misrepresent visual character counts due to multi-codepoint emojis (e.g., `🏳️‍🌈`), combining marks (like in German 'ü', Spanish 'ñ', or Hindi text), and 'Zalgo' text. Unlike `String.normalize()` or libraries like `punycode.js`, `grapheme-splitter` provides a comprehensive solution for these complex Unicode cases. The current stable version is 1.0.4, indicating a mature and stable codebase with an infrequent release cadence focused on maintenance rather than rapid feature additions. Its key differentiator is precise adherence to Unicode grapheme cluster rules, making it essential for text processing, input field validation, and display logic in internationalized applications.

Common errors

Warnings

Install

Imports

Quickstart

This example demonstrates how to initialize `GraphemeSplitter` and use its `splitGraphemes` and `countGraphemes` methods across various complex Unicode strings, including multi-codepoint emojis, diacritics, Hindi text, and Zalgo text, showing the correct user-perceived character counts and segments.

const GraphemeSplitter = require('grapheme-splitter');

const splitter = new GraphemeSplitter();

const textWithEmojis = "🌷🎁💩😜👍🏳️‍🌈";
const textWithDiacritics = "Ĺo͂řȩm̅"; // 10 JavaScript chars
const hindiText = "अनुच्छेद"; // Hindi word, 8 JavaScript chars
const zalgoText = "Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘"; // 58 JavaScript chars

console.log('--- Splitting Emojis ---');
console.log(`Original: "${textWithEmojis}" (length: ${textWithEmojis.length})`);
console.log(`Split: ${JSON.stringify(splitter.splitGraphemes(textWithEmojis))} (count: ${splitter.countGraphemes(textWithEmojis)})`);
// Expected: ["🌷","🎁","💩","😜","👍","🏳️‍🌈"] (count: 6)

console.log('\n--- Splitting Diacritics ---');
console.log(`Original: "${textWithDiacritics}" (length: ${textWithDiacritics.length})`);
console.log(`Split: ${JSON.stringify(splitter.splitGraphemes(textWithDiacritics))} (count: ${splitter.countGraphemes(textWithDiacritics)})`);
// Expected: ["Ĺ","o͂","ř","ȩ","m̅"] (count: 5)

console.log('\n--- Splitting Hindi ---');
console.log(`Original: "${hindiText}" (length: ${hindiText.length})`);
console.log(`Split: ${JSON.stringify(splitter.splitGraphemes(hindiText))} (count: ${splitter.countGraphemes(hindiText)})`);
// Expected: ["अ","नु","च्","छे","द"] (count: 5)

console.log('\n--- Splitting Zalgo ---');
console.log(`Original: "${zalgoText}" (length: ${zalgoText.length})`);
console.log(`Split: ${JSON.stringify(splitter.splitGraphemes(zalgoText))} (count: ${splitter.countGraphemes(zalgoText)})`);
// Expected: ["Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍","A̴̵̜̰͔ͫ͗͢","L̠ͨͧͩ͘","G̴̻͈͍͔̹̑͗̎̅͛́","Ǫ̵̹̻̝̳͂̌̌͘"] (count: 5)

view raw JSON →