Lindera WASM (IPADIC) for Bundlers

2.3.4 · active · verified Sun Apr 19

This package, `lindera-wasm-ipadic-bundler`, provides a WebAssembly-based Japanese morphological analyzer specifically tailored for JavaScript bundler environments. It includes the widely used IPADIC dictionary directly embedded, enabling offline and efficient text processing. While the broader Lindera WASM ecosystem has moved to v3.x (e.g., `lindera-wasm-ipadic-web` and `lindera-wasm-ipadic-nodejs`), this particular 'bundler' target package is currently at v2.3.4. The Lindera project generally releases frequent minor updates and patches, with major versions introducing broader architectural changes, such as revised package structures. Its key differentiators include its WebAssembly foundation for performance, the convenience of bundled dictionaries, and dedicated packages optimized for different JavaScript environments (browser, Node.js, bundler).

Common errors

Warnings

Install

Imports

Quickstart

This quickstart initializes the Lindera WASM module, configures a tokenizer with the embedded IPADIC dictionary, and demonstrates how to tokenize a Japanese sentence and access token details.

import init, { TokenizerBuilder } from 'lindera-wasm-ipadic-bundler';

async function main() {
    // Initialize the WebAssembly module. This is crucial and must be awaited
    // before any other functions from the module can be used.
    await init();

    // Create a new TokenizerBuilder instance to configure the tokenizer.
    const builder = new TokenizerBuilder();

    // Specify the dictionary to use. 'embedded://ipadic' uses the dictionary
    // bundled with this package.
    builder.setDictionary("embedded://ipadic");

    // Set the tokenization mode, 'normal' is suitable for general text.
    builder.setMode("normal");

    // Build the tokenizer with the specified settings.
    const tokenizer = builder.build();

    // Define the Japanese sentence to be tokenized.
    const sentence = "すもももももももものうち";
    const tokens = tokenizer.tokenize(sentence);

    console.log(`Tokenizing sentence: "${sentence}"`);
    console.log("--- Tokens ---");

    // Iterate over the resulting tokens and print their surface form and details.
    tokens.forEach(token => {
        console.log(`${token.surface} [${token.details.join(", ")}]`);
    });

    // Demonstrate accessing specific details of a token.
    if (tokens.length > 0) {
        const firstToken = tokens[0];
        console.log(`\nFirst token surface: ${firstToken.surface}`);
        // The details array contains information like part-of-speech, conjugation, etc.
        console.log(`First token part of speech: ${firstToken.details[0]}`);
    }
}

// Execute the main asynchronous function.
main().catch(console.error);

view raw JSON →