StringDecoder for Userland
The `string_decoder` package provides a userland implementation of the Node.js core `string_decoder` module. It is designed to correctly decode `Buffer` objects into strings, specifically handling multi-byte UTF-8 and UTF-16 characters that may span across multiple buffer chunks. This prevents issues like malformed characters when processing streamed or chunked data. Maintained by the Node.js Streams Working Group, it offers a stable and reliable solution for character decoding outside of the core Node.js environment. The current stable version is 1.3.0, released 7 years ago, indicating its maturity. Prior to version 1.0.0, its versions mirrored those of Node.js core; since 1.0.0, it adheres to Semantic Versioning. Its key differentiator is being a direct, semantically versioned mirror of the high-performance Node.js core implementation, making it suitable for environments where Node's built-in module is not directly available or a specific version parity is required, such as in browserify bundles.
Common errors
-
TypeError: Cannot read property 'length' of undefined
cause Attempting to decode an excessively large buffer, exceeding V8's maximum string length. In older Node.js versions, `string_decoder.write()` might return `undefined` instead of throwing an explicit error.fixBreak very large input buffers into smaller chunks before passing them to `stringDecoder.write()`. Node.js has a maximum string length (e.g., ~536MB in V8), which applies to the output of `string_decoder`. This issue was addressed in later Node.js versions to throw `ERR_STRING_TOO_LONG`. -
Error: 'StringDecoder' is not exported by ../../../../../../../../node_modules/string_decoder/lib/string_decoder.js, imported by node_modules/@frida/readable-stream/lib/readable.js
cause This error typically occurs in bundling environments (like Rollup or esbuild) where the bundler tries to resolve `string_decoder` as an ESM module and fails to find a named export `StringDecoder`, often due to it being a CommonJS module by default.fixEnsure your bundler correctly handles CommonJS modules and Node.js built-ins. This might involve adding a CJS plugin (e.g., `@rollup/plugin-commonjs`), configuring aliases, or explicitly telling the bundler to treat `string_decoder` as an external dependency. -
character '�' (U+FFFD) appears in output unexpectedly
cause You are likely using `Buffer.prototype.toString()` directly on partial buffers that contain incomplete multi-byte characters. `StringDecoder` is specifically designed to prevent this by buffering incomplete sequences.fixReplace direct `Buffer.prototype.toString()` calls on streamed or chunked data with `StringDecoder.write()` and `StringDecoder.end()`. This ensures that multi-byte characters are correctly assembled before decoding.
Warnings
- breaking Prior to version 1.0.0, `string_decoder` versions mirrored Node.js core versions, which did not follow semantic versioning. Starting with 1.0.0, the package adopted standard semantic versioning, meaning major version bumps now indicate breaking changes in this userland package, independent of Node.js core.
- gotcha The `string_decoder` module is specifically designed to correctly handle multi-byte characters that are split across `Buffer` instances when streamed. Simply concatenating buffers and then calling `Buffer.prototype.toString()` might result in replacement characters (�) for improperly split multi-byte sequences.
- deprecated In modern JavaScript environments, the WHATWG `TextDecoder` API (`new TextDecoder('utf-8')`) is the generally recommended and more broadly compatible alternative for decoding text from binary data, especially in browser and Web Worker contexts. `string_decoder` is considered a legacy utility module for Node.js compatibility.
- gotcha When bundling for the browser using tools like Webpack or Rollup, ensure that `string_decoder` is correctly aliased or handled. As it's a Node.js core module mirror, bundlers might incorrectly assume it's a Node.js global or misinterpret its import path, leading to errors like 'StringDecoder' is not exported by '.../lib/string_decoder.js'.
Install
-
npm install string_decoder -
yarn add string_decoder -
pnpm add string_decoder
Imports
- StringDecoder
const StringDecoder = require('string_decoder');import { StringDecoder } from 'string_decoder'; - StringDecoder (CommonJS)
const StringDecoder = require('string_decoder').StringDecoder;const { StringDecoder } = require('string_decoder');
Quickstart
import { StringDecoder } from 'string_decoder';
import { Buffer } from 'buffer';
const decoder = new StringDecoder('utf8');
// Imagine receiving a multi-byte character (like '€') split across network packets.
// The Euro symbol (€) is U+20AC, which is E2 82 AC in UTF-8.
const chunk1 = Buffer.from([0xE2]); // First byte of '€'
const chunk2 = Buffer.from([0x82]); // Second byte of '€'
const chunk3 = Buffer.from([0xAC, 0x61, 0x62]); // Third byte of '€' plus 'ab'
let decodedString = '';
decodedString += decoder.write(chunk1); // Should output '' (incomplete char buffered)
decodedString += decoder.write(chunk2); // Should output '' (still incomplete)
decodedString += decoder.write(chunk3); // Should output '€ab' (now complete and subsequent chars)
decodedString += decoder.end(); // Any remaining buffered characters are flushed
console.log(decodedString);
// Expected output: '€ab'
// Without StringDecoder, a simple buffer.toString() on chunks could lead to replacement characters.
const simpleConcat = Buffer.concat([chunk1, chunk2, chunk3]).toString('utf8');
console.log(simpleConcat);
// Expected output: '€ab' (for this specific example, but not reliable with *any* partial data)