Lindera WASM for Bundler Environments
lindera-wasm-bundler is a WebAssembly-based morphological analysis library designed for use in bundler environments like Webpack or Rollup. It provides Japanese text segmentation and part-of-speech tagging capabilities, leveraging WebAssembly for performance and portability. The current stable version is 3.0.5, with minor releases occurring frequently to address bugs and introduce minor enhancements, as seen in the recent v3.0.x series. Major version updates (like the v2 to v3 transition) introduce significant changes, often including refactoring and API adjustments. A key differentiator is its reliance on the OPFS (Origin Private File System) API for efficient runtime loading and management of large dictionary files, making it suitable for web applications requiring robust text processing without bundling dictionaries directly into the main application payload.
Common errors
-
Failed to load WASM module: CompileError: WebAssembly.instantiate(): expected magic word 00 61 73 6d, found 3c 21 44 4f
cause This error often indicates that the browser or bundler attempted to parse an HTML document (e.g., an error page) as a WebAssembly module, usually because the WASM file could not be found at the expected path.fixVerify that your bundler is correctly configuring the output path and serving the WebAssembly `.wasm` file. For Vite, ensure the `optimizeDeps.exclude` configuration is correctly applied to prevent issues with pre-bundling that might interfere with WASM file resolution. -
ReferenceError: __wbg_init is not defined
cause The default export for initializing the WebAssembly module, `__wbg_init`, was either not imported or was imported incorrectly (e.g., as a named import).fixEnsure `__wbg_init` is imported as the default export: `import __wbg_init from 'lindera-wasm-bundler';` -
TypeError: Cannot read properties of undefined (reading 'metadata')
cause This error typically occurs when attempting to access properties like `metadata` from the object returned by `loadDictionaryFiles` before the dictionary files have been successfully loaded or if `loadDictionaryFiles` failed to retrieve them.fixEnsure `downloadDictionary` has completed successfully and that the dictionary name provided to `loadDictionaryFiles` matches the name used during download. Also, verify that the browser's Origin Private File System (OPFS) is accessible and not blocked by security policies.
Warnings
- breaking Version 3.0.0 introduced significant changes, including the removal of the dedicated Node.js WASM target and renaming of npm packages. Projects targeting Node.js should now use `lindera-nodejs` instead of WASM packages. Projects using WASM in bundler environments should use `lindera-wasm-bundler`.
- gotcha When using `lindera-wasm-bundler` (or `lindera-wasm-web`) in a Vite project, you must explicitly exclude it from `optimizeDeps` in your `vite.config.js` to prevent issues with Vite's dependency pre-bundling.
- gotcha For browser extension development, a `content_security_policy` (CSP) must be configured in `manifest.json` to allow WebAssembly execution using `wasm-unsafe-eval`. Additionally, Vite's dev server might require CORS configuration for `chrome-extension://` origins.
- gotcha All functions interacting with the WebAssembly module, including `__wbg_init` and methods on the `Tokenizer` instance, are asynchronous and return Promises. Forgetting to `await` them can lead to unhandled promise rejections or functions not completing before subsequent operations.
Install
-
npm install lindera-wasm-bundler -
yarn add lindera-wasm-bundler -
pnpm add lindera-wasm-bundler
Imports
- __wbg_init
import { __wbg_init } from 'lindera-wasm-bundler';import __wbg_init from 'lindera-wasm-bundler';
- TokenizerBuilder
const { TokenizerBuilder } = require('lindera-wasm-bundler');import { TokenizerBuilder } from 'lindera-wasm-bundler'; - loadDictionaryFromBytes
import { loadDictionaryFromBytes } from 'lindera-wasm-bundler'; - downloadDictionary
import { downloadDictionary } from 'lindera-wasm-bundler';import { downloadDictionary, loadDictionaryFiles } from 'lindera-wasm-bundler/opfs';
Quickstart
import __wbg_init, { TokenizerBuilder, loadDictionaryFromBytes } from 'lindera-wasm-bundler';
import { downloadDictionary, loadDictionaryFiles } from 'lindera-wasm-bundler/opfs';
async function initializeAndTokenize() {
// Initialize the WebAssembly module
await __wbg_init();
// Check if dictionary is already downloaded or download it for the first time.
// In a real application, you might use local storage or IndexedDB to track this.
console.log("Attempting to download dictionary (if not already present)...");
try {
await downloadDictionary(
"https://github.com/lindera/lindera/releases/download/v3.0.0/lindera-ipadic-3.0.0.zip",
"ipadic"
);
console.log("Dictionary 'ipadic' ready in OPFS.");
} catch (error) {
console.warn("Error downloading dictionary, might be already present or network issue:", error);
}
// Load dictionary from OPFS
console.log("Loading dictionary files from OPFS...");
const files = await loadDictionaryFiles("ipadic");
const dict = loadDictionaryFromBytes(
files.metadata, files.dictDa, files.dictVals,
files.dictWordsIdx, files.dictWords, files.matrixMtx,
files.charDef, files.unk
);
console.log("Dictionary loaded into memory.");
// Create a tokenizer instance
const builder = new TokenizerBuilder();
builder.setDictionaryInstance(dict);
builder.setMode("normal"); // "normal", "decompose", or "search"
const tokenizer = builder.build();
// Tokenize a sentence
const text = "すもももももももものうち";
console.log(`Tokenizing: "${text}"`);
const tokens = tokenizer.tokenize(text);
tokens.forEach(token => {
console.log(`Surface: ${token.surface}, Details: ${token.details.join(", ")}`);
});
console.log("Tokenization complete.");
}
initializeAndTokenize().catch(console.error);