tiktoken-rs for Node.js

1.0.6 · active · verified Sun Apr 19

tiktoken-rs-node provides performant NAPI Rust bindings for the `tiktoken-rs` library, which implements OpenAI's `tiktoken` BPE tokenizer for encoding and decoding text to and from token integers. As of version 1.0.6, it offers a zero-copy encode mechanism and avoids WebAssembly (WASM), differentiating it from other Node.js `tiktoken` implementations that often rely on WASM. The package is actively maintained with frequent patch releases, indicating ongoing support for its current stable API. Its primary use case is for applications requiring efficient tokenization of text for large language models, particularly in Node.js environments where high performance and minimal overhead are critical. It supports a range of encodings including `cl100k_base`, `gpt2`, and `o200k_base`.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to load an `tiktoken` encoding, tokenize a string, and then decode the tokens back into a string using the `tiktoken-rs-node` library.

import { getEncoding } from 'tiktoken-rs-node';

// Load the encoding once per application startup for optimal performance.
// Supported encodings: 'o200k_base', 'cl100k_base', 'p50k_edit', 'p50k_base', 'r50k_base', 'gpt2'
const encodingName = 'cl100k_base';
const encoding = getEncoding(encodingName);

const textToEncode = "This is a sample sentence for tokenization using tiktoken-rs-node.";

// Encode the text into an array of token integers
const tokens = encoding.encode(textToEncode);
console.log(`Original text: "${textToEncode}"`);
console.log(`Encoded tokens (${encodingName}): [${tokens.join(', ')}]`);
console.log(`Number of tokens: ${tokens.length}`);

// Decode the tokens back into a string
const decodedString = encoding.decode(tokens);
console.log(`Decoded string: "${decodedString}"`);

// Example with a different encoding
const gpt2Encoding = getEncoding('gpt2');
const gpt2Tokens = gpt2Encoding.encode("Hello, world!");
console.log(`GPT-2 Encoded tokens: [${gpt2Tokens.join(', ')}]`);

view raw JSON →