Unicode Confusables Utility
The `unicode-confusables` utility provides functions to detect and resolve visually confusing Unicode characters in strings, adhering to the security guidelines outlined in Unicode Technical Standard #39 (UTS39). It leverages the `confusables.txt` data file to identify characters that can be easily mistaken for others, including homoglyphs and zero-width characters. Currently at version 0.1.1, the library's release cadence is tied to updates in the UTS39 standard and `confusables.txt` data. Its primary differentiators include direct adherence to the official Unicode standard, the ability to not only detect but also rectify confused characters, and support for a wide range of scripts, including non-Latin languages. It also provides a mechanism to update its underlying data set, making it crucial for applications requiring robust input validation and security against 'homograph attacks' or similar visual spoofing.
Common errors
-
TypeError: (0, unicode_confusables_1.isConfusing) is not a function
cause This typically occurs when trying to use named exports with a default import syntax in an ESM environment after TypeScript transpilation.fixEnsure you are using named imports: `import { isConfusing } from 'unicode-confusables';` -
TypeError: isConfusing is not a function
cause This happens when attempting to call `isConfusing` on the entire module object, rather than destructuring the named export, particularly in CommonJS.fixUse object destructuring for CommonJS `require`: `const { isConfusing } = require('unicode-confusables');` -
Module not found: Error: Can't resolve 'unicode-confusables'
cause The package has not been installed, or there's a typo in the import path.fixRun `npm install unicode-confusables` or `yarn add unicode-confusables` to install the package. Verify the import path is exactly `'unicode-confusables'`.
Warnings
- breaking As a 0.x.x version, the API is not yet stable and breaking changes may be introduced in minor or patch versions without a major version increment. It is advisable to pin exact versions or frequently review release notes.
- gotcha The underlying `confusables.txt` data, sourced from unicode.org, can be updated. If your application relies on the latest data for security, you must periodically run `npm run update` to fetch and parse a fresh copy.
- gotcha This library's definition of 'confusing' is strictly based on Unicode UTS39. It does not cover all possible visual spoofing methods (e.g., domain squatting, visual similarities not listed in UTS39, or culturally specific visual attacks).
- gotcha Processing very long strings or making frequent calls to `confusables` or `isConfusing` in a performance-critical loop can be computationally intensive, as it involves character-by-character analysis and lookups.
Install
-
npm install unicode-confusables -
yarn add unicode-confusables -
pnpm add unicode-confusables
Imports
- isConfusing
const isConfusing = require('unicode-confusables').isConfusing;import { isConfusing } from 'unicode-confusables'; - confusables
import confusables from 'unicode-confusables';
import { confusables } from 'unicode-confusables'; - rectifyConfusion
const { rectifyConfusion } = require('unicode-confusables');import { rectifyConfusion } from 'unicode-confusables';
Quickstart
import { isConfusing, confusables, rectifyConfusion } from 'unicode-confusables';
async function demonstrateConfusables() {
const confusingString = 'fоо'; // 'o' here is a Cyrillic 'о' (U+043E)
const regularString = 'foo';
console.log(`Is '${confusingString}' confusing? ${isConfusing(confusingString)}`);
console.log(`Is '${regularString}' confusing? ${isConfusing(regularString)}`);
console.log(`Confusables for '${confusingString}':`, confusables(confusingString));
console.log(`Rectified '${confusingString}': '${rectifyConfusion(confusingString)}'`);
const zeroWidthString = 'vitalik\u200b'; // vitalik with a zero-width space (U+200B)
console.log(`Is '${zeroWidthString}' confusing (with zero-width char)? ${isConfusing(zeroWidthString)}`);
console.log(`Confusables for '${zeroWidthString}':`, confusables(zeroWidthString));
const mixedCaseHomoglyph = 'mI01'; // common homoglyphs for m, I, 0, 1
console.log(`Confusables for '${mixedCaseHomoglyph}':`, confusables(mixedCaseHomoglyph));
}
demonstrateConfusables();