JS Tiktoken
js-tiktoken is a pure JavaScript port of OpenAI's tiktoken library, providing a BPE (Byte Pair Encoding) tokenizer primarily for use with OpenAI's models. Currently at version 1.0.21, the library undergoes frequent patch releases, mainly to incorporate new OpenAI models and their corresponding tokenizer configurations. Its key differentiators include being a pure JavaScript implementation, making it suitable for web browsers, edge environments, and Node.js applications where Python dependencies are not feasible. It also offers a "lite" mode, allowing developers to load only specific encoding ranks to significantly reduce bundle size, or to dynamically fetch encoding data from a CDN, addressing concerns about the full library's potentially large footprint.
Common errors
-
TypeError: Cannot find module 'js-tiktoken/ranks/o200k_base'
cause You are likely using the `js-tiktoken/lite` import strategy but your bundler (e.g., Webpack, Rollup, Vite) is not configured to correctly handle direct imports of the raw JSON rank data files from `js-tiktoken/ranks/`.fixEnsure your build tool is configured to process and include static JSON files. For example, in Webpack, you might need a `json-loader`. Alternatively, fetch the rank data dynamically from a CDN as outlined in the `js-tiktoken/lite` documentation example. -
TypeError: getEncoding is not a function
cause This error typically occurs when trying to access a named export like `getEncoding` directly from a CommonJS `require` call in a package that is primarily ESM, or when destructuring is applied incorrectly.fixUse native ESM imports: `import { getEncoding } from 'js-tiktoken';`. If you must use CommonJS, ensure you are correctly accessing the named export, potentially via `const { getEncoding } = require('js-tiktoken');` though full ESM compatibility is recommended. -
Error: Unknown encoding 'some-unrecognized-model-name'
cause The model name provided to `getEncoding` or `encodingForModel` does not match any of the currently supported models or encoding schemes within the `js-tiktoken` library.fixVerify the exact spelling and casing of the model name. Consult the `js-tiktoken` documentation or the underlying `tiktoken` library's list of supported models, as these are frequently updated with new OpenAI releases. Consider adding a `try...catch` block to handle cases where a model name might become deprecated or is not yet supported.
Warnings
- gotcha Importing the main `js-tiktoken` package directly (e.g., `import { getEncoding } from 'js-tiktoken';`) will bundle *all* OpenAI tokenizer data, which can significantly increase your application's bundle size, especially for web environments.
- gotcha `js-tiktoken` is built as an ESM-first package. While CommonJS `require` might partially work for some exports through transpilation or Node.js interoperability, it is not the primary pattern, and can lead to unexpected behavior or larger bundle sizes in certain setups.
- gotcha Model names used with `encodingForModel` are frequently updated and can be case-sensitive. Using an unrecognized or incorrectly cased model name will result in an `Error: Unknown encoding`.
Install
-
npm install js-tiktoken -
yarn add js-tiktoken -
pnpm add js-tiktoken
Imports
- getEncoding
const getEncoding = require('js-tiktoken').getEncoding;import { getEncoding } from 'js-tiktoken'; - encodingForModel
const { encodingForModel } = require('js-tiktoken');import { encodingForModel } from 'js-tiktoken'; - Tiktoken
import { Tiktoken } from 'js-tiktoken';import { Tiktoken } from 'js-tiktoken/lite';
Quickstart
import assert from 'node:assert';
import { getEncoding, encodingForModel } from 'js-tiktoken';
// Basic usage: Get an encoding directly
const enc = getEncoding('gpt2');
const encodedTokens = enc.encode('hello world');
console.log(`'gpt2' tokens for 'hello world': ${encodedTokens}`);
assert(enc.decode(encodedTokens) === 'hello world');
// Model-specific usage: Get encoding for a known model
const modelName = 'gpt-4'; // Or 'gpt-3.5-turbo', 'text-embedding-ada-002', etc.
try {
const modelEnc = encodingForModel(modelName);
const text = 'This is an example sentence for GPT-4 tokenization.';
const tokens = modelEnc.encode(text);
console.log(`\n'${modelName}' tokens: ${tokens.length}, tokens array: [${tokens.slice(0, 5)}..., ${tokens.slice(-5)}]`);
const decoded = modelEnc.decode(tokens);
console.log(`'${modelName}' decoded: ${decoded}`);
} catch (error) {
console.error(`\nError getting encoding for model '${modelName}':`, error.message);
}