UTF-8 Encoder/Decoder for JavaScript
The `utf8.js` package, currently at its stable version 3.0.0 (last updated in late 2017), provides a comprehensively tested and robust JavaScript implementation for encoding and decoding UTF-8 strings. It distinguishes itself by strictly adhering to the Encoding Standard, ensuring precise handling of all scalar Unicode code point values. A core aspect of its design is strict error handling: the library explicitly throws an `Error` when attempting to encode non-scalar values (such as lone surrogates) or when encountering malformed UTF-8 data during decoding. This approach prioritizes data integrity over silent error correction. For developers requiring the ability to encode or decode non-scalar values, the related `WTF-8` library is recommended. Given its foundational utility and mature status, the project is considered to be in maintenance mode, receiving updates primarily for critical issues rather than frequent feature additions.
Common errors
-
Error: A lone surrogate code point was found.
cause Attempting to encode a JavaScript string containing an unpaired (lone) surrogate character (e.g., `\uD800` without a trailing `\uDC00` to `\uDFFF`).fixValidate and sanitize input strings to ensure they are well-formed Unicode before passing them to `utf8.encode()`. You can replace lone surrogates or use `String.prototype.toWellFormed()` (if targeting environments that support it) or related libraries. -
Error: Malformed UTF-8 data.
cause The input string provided to `utf8.decode()` contains byte sequences that do not conform to valid UTF-8 encoding rules.fixCheck the origin and integrity of the byte string being decoded. Ensure it has been correctly encoded as UTF-8. Wrap `utf8.decode()` calls in a `try...catch` block to handle invalid input gracefully, e.g., by logging the error and using a fallback or replacement. -
TypeError: require is not a function (in ES module context) or SyntaxError: Cannot use import statement outside a module (in CJS context)
cause This package is a CommonJS module. Using `require()` in an ES module context or `import` in a CommonJS context will lead to module resolution errors.fixFor Node.js, ensure you use `const utf8 = require('utf8');` in CommonJS modules (`.js` files where `"type": "module"` is not set or in `.cjs` files). If working in an ES module environment (`.mjs` files or `"type": "module"` in `package.json`), you may need to use dynamic `import('utf8')` or rely on a bundler like Webpack or Rollup to handle the CommonJS dependency.
Warnings
- gotcha The `utf8.encode()` method will throw an `Error` if the input JavaScript string contains a non-scalar value, specifically a lone surrogate (a `U+D800` to `U+DFFF` code point not part of a valid surrogate pair). This strict behavior is by design, adhering to the Encoding Standard for proper UTF-8.
- gotcha The `utf8.decode()` method will throw an `Error` when it detects malformed UTF-8 byte sequences in the input `byteString`. This strictness prevents silent data corruption that can occur with lenient decoders.
Install
-
npm install utf8 -
yarn add utf8 -
pnpm add utf8
Imports
- utf8
import utf8 from 'utf8';
const utf8 = require('utf8'); - global utf8 object
<!-- In browser --> <script src="utf8.js"></script> <script> const encoded = utf8.encode('Hello'); </script> - utf8.encode
const { encode } = require('utf8'); // Not a named exportconst utf8 = require('utf8'); const encodedString = utf8.encode('Hello, world! 😊');
Quickstart
const utf8 = require('utf8');
// Example 1: Encoding a basic string
const originalString1 = 'Hello, world! 👋';
const encodedString1 = utf8.encode(originalString1);
console.log(`Original: '${originalString1}'`);
console.log(`Encoded: '${encodedString1}'`);
const decodedString1 = utf8.decode(encodedString1);
console.log(`Decoded: '${decodedString1}'`);
// Example 2: Encoding a string with a multi-byte Unicode character
// U+1F60A SMILING FACE WITH SMILING EYES
const originalString2 = 'Smiling face: \uD83D\uDE0A';
const encodedString2 = utf8.encode(originalString2);
console.log(`\nOriginal: '${originalString2}'`);
console.log(`Encoded: '${encodedString2}'`);
const decodedString2 = utf8.decode(encodedString2);
console.log(`Decoded: '${decodedString2}'`);
// Example 3: Demonstrate version access
console.log(`\nutf8.js version: ${utf8.version}`);