{"id":13467,"library":"lindera-wasm-cc-cedict-bundler","title":"Lindera WASM Chinese Dictionary (CC-CEDICT) for Bundlers","description":"Lindera is a high-performance Rust-based morphological analysis library, compiled to WebAssembly (WASM) for use in JavaScript environments. This specific package, `lindera-wasm-cc-cedict-bundler`, provides Chinese morphological analysis capabilities utilizing the comprehensive CC-CEDICT dictionary. It is specifically designed and optimized for integration with JavaScript bundlers like Webpack, Rollup, or Vite, facilitating efficient tokenization and detailed linguistic analysis of Chinese text directly within web applications. The Lindera project generally maintains a regular release cadence, with major updates often reflecting enhancements in the core Rust library, improvements in WASM compilation targets, or dictionary updates. Key differentiators include its performance due to the WASM architecture, the availability of specialized dictionary sets via distinct npm packages, and a consistent API across different language analysis modules. Version 3.0.5 is the current stable release, offering robust and safe dictionary deserialization.","status":"active","version":"2.3.4","language":"javascript","source_language":"en","source_url":"https://github.com/lindera/lindera","tags":["javascript","morphological","analysis","library","wasm","webassembly","typescript"],"install":[{"cmd":"npm install lindera-wasm-cc-cedict-bundler","lang":"bash","label":"npm"},{"cmd":"yarn add lindera-wasm-cc-cedict-bundler","lang":"bash","label":"yarn"},{"cmd":"pnpm add lindera-wasm-cc-cedict-bundler","lang":"bash","label":"pnpm"}],"dependencies":[],"imports":[{"note":"This function initializes the WebAssembly module and must be called and awaited before any other functions from the library. Since v3, ESM is the primary module system for bundler targets.","wrong":"const __wbg_init = require('lindera-wasm-cc-cedict-bundler');","symbol":"__wbg_init","correct":"import __wbg_init from 'lindera-wasm-cc-cedict-bundler';"},{"note":"TokenizerBuilder is a named export used to configure and construct a tokenizer instance. Avoid default import mistakes.","wrong":"import TokenizerBuilder from 'lindera-wasm-cc-cedict-bundler';","symbol":"TokenizerBuilder","correct":"import { TokenizerBuilder } from 'lindera-wasm-cc-cedict-bundler';"},{"note":"Import the Token type for TypeScript usage to ensure type safety when working with tokenized results.","symbol":"Token","correct":"import type { Token } from 'lindera-wasm-cc-cedict-bundler';"}],"quickstart":{"code":"import __wbg_init, { TokenizerBuilder } from 'lindera-wasm-cc-cedict-bundler';\n\nasync function main() {\n    // Initialize the WASM module. This is crucial and must be awaited.\n    // Failure to await will result in runtime errors.\n    await __wbg_init();\n\n    // Create a new tokenizer builder instance.\n    const builder = new TokenizerBuilder();\n\n    // Specify the embedded CC-CEDICT dictionary for Chinese analysis.\n    // The 'embedded://' prefix indicates a dictionary bundled with the package.\n    builder.setDictionary(\"embedded://cc-cedict\");\n\n    // Set the tokenization mode. Common modes include \"normal\", \"decompose\", \"search\".\n    builder.setMode(\"normal\");\n\n    // Build the tokenizer instance from the configured builder.\n    const tokenizer = builder.build();\n\n    // Text to tokenize in Chinese.\n    const text = \"他们说汉语。\"; // \"They speak Chinese.\"\n\n    // Tokenize the text into an array of tokens.\n    const tokens = tokenizer.tokenize(text);\n\n    // Process and log the tokens with their details.\n    console.log(`Tokenizing \"${text}\":`);\n    tokens.forEach(token => {\n        // Each token object contains the surface form and an array of linguistic details.\n        console.log(`- Surface: \"${token.surface}\", Details: [${token.details.join(\", \")}]`);\n        // Example details for Chinese might include Pinyin, part of speech, etc.\n    });\n}\n\n// Run the main function and catch any potential errors during execution.\nmain().catch(console.error);\n","lang":"typescript","description":"Demonstrates how to initialize the WASM module, configure a tokenizer for Chinese with CC-CEDICT, and tokenize a sample Chinese sentence."},"warnings":[{"fix":"Review release notes for v3.0.0 and subsequent v3.x releases. Ensure error handling aligns with new propagation mechanisms. Verify dictionary loading and tokenization behavior in your application.","message":"Version 3.0.0 introduced significant internal changes, including a shift from unsafe DAAC deserialization to a safe method and improved error propagation in dictionary handling. While the package name `lindera-wasm-cc-cedict-bundler` itself persisted, these changes can affect runtime behavior and error handling logic, requiring review of existing code upon upgrade from v2.","severity":"breaking","affected_versions":">=3.0.0"},{"fix":"Always call `await __wbg_init();` at the beginning of your application's entry point or before interacting with `TokenizerBuilder` and other library functions.","message":"The WebAssembly module must be explicitly initialized by calling and awaiting the `__wbg_init()` function before any other functions from the package can be used. Failing to do so will result in runtime errors related to uninitialized WASM.","severity":"gotcha","affected_versions":">=1.0.0"},{"fix":"Prefer `import ... from 'lindera-wasm-cc-cedict-bundler'` syntax. Ensure your bundler is configured to handle WASM modules and ESM correctly.","message":"When targeting bundlers and web environments, `lindera-wasm-*` packages are primarily designed for ES Module (ESM) imports. Using CommonJS `require()` syntax might lead to issues with bundlers or incorrect module resolution in modern JavaScript environments.","severity":"gotcha","affected_versions":">=3.0.0"},{"fix":"Verify the dictionary identifier string matches the bundled dictionary, which for this package is `embedded://cc-cedict`.","message":"This specific package is pre-configured with the CC-CEDICT dictionary. When setting the dictionary, ensure you use `builder.setDictionary(\"embedded://cc-cedict\")`. Using an incorrect or unsupported dictionary identifier will result in a dictionary not found error.","severity":"gotcha","affected_versions":">=1.0.0"}],"env_vars":null,"last_verified":"2026-04-19T00:00:00.000Z","next_check":"2026-07-18T00:00:00.000Z","problems":[{"fix":"Ensure `await __wbg_init();` is executed at the start of your application, usually in an async IIFE or your main setup function.","cause":"The `__wbg_init()` function was not called or not awaited before attempting to use other library functions.","error":"Error: WASM module is not initialized."},{"fix":"Confirm `await __wbg_init();` has finished. Verify `import { TokenizerBuilder } from 'lindera-wasm-cc-cedict-bundler';` for correct named import.","cause":"This typically occurs if `__wbg_init()` has not completed, or if `TokenizerBuilder` is imported incorrectly (e.g., as a default import instead of named).","error":"TypeError: TokenizerBuilder is not a constructor"},{"fix":"Change the dictionary setting to `builder.setDictionary(\"embedded://cc-cedict\");` to use the correct bundled dictionary.","cause":"Attempted to load a dictionary that is not bundled with this specific package. `lindera-wasm-cc-cedict-bundler` only contains the CC-CEDICT dictionary.","error":"Error: Dictionary not found: embedded://ipadic"},{"fix":"Refactor your imports to use ES Modules syntax: `import ... from 'lindera-wasm-cc-cedict-bundler';`.","cause":"Attempting to use CommonJS `require()` syntax in a browser environment or a project configured for ESM only.","error":"ReferenceError: require is not defined"}],"ecosystem":"npm","meta_description":null,"install_score":null,"install_tag":null,"quickstart_score":null,"quickstart_tag":null,"pypi_latest":null,"cli_name":"","cli_version":null}