Unicode Confusables Utility

0.1.1 · active · verified Sun Apr 19

The `unicode-confusables` utility provides functions to detect and resolve visually confusing Unicode characters in strings, adhering to the security guidelines outlined in Unicode Technical Standard #39 (UTS39). It leverages the `confusables.txt` data file to identify characters that can be easily mistaken for others, including homoglyphs and zero-width characters. Currently at version 0.1.1, the library's release cadence is tied to updates in the UTS39 standard and `confusables.txt` data. Its primary differentiators include direct adherence to the official Unicode standard, the ability to not only detect but also rectify confused characters, and support for a wide range of scripts, including non-Latin languages. It also provides a mechanism to update its underlying data set, making it crucial for applications requiring robust input validation and security against 'homograph attacks' or similar visual spoofing.

Common errors

Warnings

Install

Imports

Quickstart

Demonstrates how to check if a string contains confusing Unicode characters, identify the specific confusables, and rectify them. It also shows detection of zero-width characters and homoglyphs.

import { isConfusing, confusables, rectifyConfusion } from 'unicode-confusables';

async function demonstrateConfusables() {
  const confusingString = 'fоо'; // 'o' here is a Cyrillic 'о' (U+043E)
  const regularString = 'foo';

  console.log(`Is '${confusingString}' confusing? ${isConfusing(confusingString)}`);
  console.log(`Is '${regularString}' confusing? ${isConfusing(regularString)}`);

  console.log(`Confusables for '${confusingString}':`, confusables(confusingString));
  console.log(`Rectified '${confusingString}': '${rectifyConfusion(confusingString)}'`);

  const zeroWidthString = 'vitalik\u200b'; // vitalik with a zero-width space (U+200B)
  console.log(`Is '${zeroWidthString}' confusing (with zero-width char)? ${isConfusing(zeroWidthString)}`);
  console.log(`Confusables for '${zeroWidthString}':`, confusables(zeroWidthString));

  const mixedCaseHomoglyph = 'mI01'; // common homoglyphs for m, I, 0, 1
  console.log(`Confusables for '${mixedCaseHomoglyph}':`, confusables(mixedCaseHomoglyph));
}

demonstrateConfusables();

view raw JSON →