tiktoken-rs for Node.js
tiktoken-rs-node provides performant NAPI Rust bindings for the `tiktoken-rs` library, which implements OpenAI's `tiktoken` BPE tokenizer for encoding and decoding text to and from token integers. As of version 1.0.6, it offers a zero-copy encode mechanism and avoids WebAssembly (WASM), differentiating it from other Node.js `tiktoken` implementations that often rely on WASM. The package is actively maintained with frequent patch releases, indicating ongoing support for its current stable API. Its primary use case is for applications requiring efficient tokenization of text for large language models, particularly in Node.js environments where high performance and minimal overhead are critical. It supports a range of encodings including `cl100k_base`, `gpt2`, and `o200k_base`.
Common errors
-
Error: Cannot find module 'tiktoken-rs-node' or its corresponding type declarations.
cause The package either failed to install correctly (e.g., prebuilt binaries missing, build from source failed) or the import path is incorrect.fixRun `npm install tiktoken-rs-node` again. Check the installation logs for errors related to NAPI or Rust compilation. If running in a specific environment, verify Rust toolchain presence or ensure prebuilt binaries exist for your platform. Verify the import statement `import { getEncoding } from 'tiktoken-rs-node';`. -
TypeError: encoding.encode is not a function
cause The `encoding` object returned by `getEncoding` is not the expected `Tiktoken` instance, or `getEncoding` itself failed to return a valid object. This might happen if `getEncoding` was called with an invalid name.fixEnsure `getEncoding` was called with a valid string encoding name (e.g., 'cl100k_base'). Add error handling around `getEncoding` calls. Verify `encoding` is not `undefined` or `null` before calling methods.
Warnings
- gotcha Calling `getEncoding` multiple times with the same encoding name can lead to unnecessary overhead. For long-running processes, it is highly recommended to call `getEncoding` only once per-startup and reuse the returned `Tiktoken` instance.
- breaking As a Node.js NAPI binding, `tiktoken-rs-node` includes prebuilt binaries for common platforms. If your deployment environment (e.g., specific Docker image, serverless function, or CPU architecture) is not supported by the prebuilt binaries, you may encounter installation failures requiring local compilation, which needs a Rust toolchain.
- gotcha Using an unsupported encoding name with `getEncoding` will result in a runtime error. Ensure you use one of the officially supported encoding names (e.g., 'cl100k_base', 'gpt2', 'o200k_base').
Install
-
npm install tiktoken-rs-node -
yarn add tiktoken-rs-node -
pnpm add tiktoken-rs-node
Imports
- getEncoding
const getEncoding = require('tiktoken-rs-node');import { getEncoding } from 'tiktoken-rs-node'; - Tiktoken
import { Tiktoken } from 'tiktoken-rs-node';import type { Tiktoken } from 'tiktoken-rs-node'; - CommonJS
const { getEncoding } = require('tiktoken-rs-node');
Quickstart
import { getEncoding } from 'tiktoken-rs-node';
// Load the encoding once per application startup for optimal performance.
// Supported encodings: 'o200k_base', 'cl100k_base', 'p50k_edit', 'p50k_base', 'r50k_base', 'gpt2'
const encodingName = 'cl100k_base';
const encoding = getEncoding(encodingName);
const textToEncode = "This is a sample sentence for tokenization using tiktoken-rs-node.";
// Encode the text into an array of token integers
const tokens = encoding.encode(textToEncode);
console.log(`Original text: "${textToEncode}"`);
console.log(`Encoded tokens (${encodingName}): [${tokens.join(', ')}]`);
console.log(`Number of tokens: ${tokens.length}`);
// Decode the tokens back into a string
const decodedString = encoding.decode(tokens);
console.log(`Decoded string: "${decodedString}"`);
// Example with a different encoding
const gpt2Encoding = getEncoding('gpt2');
const gpt2Tokens = gpt2Encoding.encode("Hello, world!");
console.log(`GPT-2 Encoded tokens: [${gpt2Tokens.join(', ')}]`);