Lindera WASM for Bundler Environments

3.0.5 · active · verified Tue Apr 21

lindera-wasm-bundler is a WebAssembly-based morphological analysis library designed for use in bundler environments like Webpack or Rollup. It provides Japanese text segmentation and part-of-speech tagging capabilities, leveraging WebAssembly for performance and portability. The current stable version is 3.0.5, with minor releases occurring frequently to address bugs and introduce minor enhancements, as seen in the recent v3.0.x series. Major version updates (like the v2 to v3 transition) introduce significant changes, often including refactoring and API adjustments. A key differentiator is its reliance on the OPFS (Origin Private File System) API for efficient runtime loading and management of large dictionary files, making it suitable for web applications requiring robust text processing without bundling dictionaries directly into the main application payload.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize the Lindera WASM module, download and load a dictionary using the OPFS API, and then tokenize a Japanese sentence, logging each token's surface form and morphological details.

import __wbg_init, { TokenizerBuilder, loadDictionaryFromBytes } from 'lindera-wasm-bundler';
import { downloadDictionary, loadDictionaryFiles } from 'lindera-wasm-bundler/opfs';

async function initializeAndTokenize() {
    // Initialize the WebAssembly module
    await __wbg_init();

    // Check if dictionary is already downloaded or download it for the first time.
    // In a real application, you might use local storage or IndexedDB to track this.
    console.log("Attempting to download dictionary (if not already present)...");
    try {
        await downloadDictionary(
            "https://github.com/lindera/lindera/releases/download/v3.0.0/lindera-ipadic-3.0.0.zip", 
            "ipadic"
        );
        console.log("Dictionary 'ipadic' ready in OPFS.");
    } catch (error) {
        console.warn("Error downloading dictionary, might be already present or network issue:", error);
    }

    // Load dictionary from OPFS
    console.log("Loading dictionary files from OPFS...");
    const files = await loadDictionaryFiles("ipadic");
    const dict = loadDictionaryFromBytes(
        files.metadata, files.dictDa, files.dictVals,
        files.dictWordsIdx, files.dictWords, files.matrixMtx,
        files.charDef, files.unk
    );
    console.log("Dictionary loaded into memory.");

    // Create a tokenizer instance
    const builder = new TokenizerBuilder();
    builder.setDictionaryInstance(dict);
    builder.setMode("normal"); // "normal", "decompose", or "search"
    const tokenizer = builder.build();

    // Tokenize a sentence
    const text = "すもももももももものうち";
    console.log(`Tokenizing: "${text}"`);
    const tokens = tokenizer.tokenize(text);

    tokens.forEach(token => {
        console.log(`Surface: ${token.surface}, Details: ${token.details.join(", ")}`);
    });
    console.log("Tokenization complete.");
}

initializeAndTokenize().catch(console.error);

view raw JSON →