Kuromoji.js
Kuromoji.js is a JavaScript implementation of a Japanese morphological analyzer, directly ported from the Java-based Kuromoji project. It provides functionality to tokenize Japanese text into its constituent words (morphemes) and extract detailed information such as part-of-speech tags, base forms, readings (pronunciation in Katakana), and surface forms. The package's current stable version is `0.1.2`, and its last known publication date was approximately eight years ago (around March 2018), indicating that the original project is largely abandoned or unmaintained. It primarily supports CommonJS modules for Node.js environments and global script inclusion for browsers, relying exclusively on callback-based asynchronous operations for dictionary loading. Due to its age, it lacks modern JavaScript features like ESM support, TypeScript definitions, and Promise-based APIs. Developers seeking these modern capabilities should consider using actively maintained forks such as `@patdx/kuromoji` or `code4fukui-es`, which offer updated architectures and features.
Common errors
-
Uncaught ReferenceError: kuromoji is not defined
cause The `kuromoji.js` script was not loaded in the HTML before attempting to use the `kuromoji` global object in the browser, or the CommonJS `require` statement was not executed in Node.js.fixIn a browser, ensure `<script src="url/to/kuromoji.js"></script>` appears before your script that uses `kuromoji`. In Node.js, ensure `var kuromoji = require('kuromoji');` is at the top of your module. -
Error: ENOENT: no such file or directory, open 'path/to/dictionary/dir/base.dat.gz'
cause The `dicPath` provided to `kuromoji.builder()` does not correctly point to the directory containing the Kuromoji dictionary files.fixVerify that `dicPath` is an absolute or correct relative path to where the `dict` folder (containing `base.dat.gz`, `tid.dat.gz`, etc.) is located. For npm installations, this is often `path.resolve(__dirname, 'node_modules/kuromoji/dict')` in Node.js.
Warnings
- gotcha The original `kuromoji` package (takuyaa/kuromoji.js) is unmaintained since its last update approximately 8 years ago. It lacks modern features like ES Modules (ESM), TypeScript typings, and Promise-based APIs, relying solely on CommonJS and callbacks.
- gotcha Incorrect `dicPath` configuration is a common source of errors, leading to `ENOENT` (file not found) or similar I/O issues during tokenizer initialization. The dictionary files are essential and must be accessible.
- deprecated The README suggests installing `kuromoji` via Bower for browser usage. Bower is a deprecated package manager and should no longer be used.
Install
-
npm install kuromoji -
yarn add kuromoji -
pnpm add kuromoji
Imports
- kuromoji
import kuromoji from 'kuromoji';
var kuromoji = require('kuromoji'); - kuromoji.builder
import { builder } from 'kuromoji';var builder = require('kuromoji').builder; - kuromoji (global)
const kuromoji = window.kuromoji;
<!-- In HTML --> <script src="url/to/kuromoji.js"></script> // In JavaScript kuromoji.builder(...)
Quickstart
const path = require('path');
const kuromoji = require('kuromoji');
const dicPath = path.resolve(__dirname, 'node_modules/kuromoji/dict');
kuromoji.builder({ dicPath: dicPath }).build(function (err, tokenizer) {
if (err) {
console.error('Error building tokenizer:', err);
return;
}
const sentence = "すもももももももものうち";
const tokens = tokenizer.tokenize(sentence);
console.log(`Tokens for "${sentence}":`);
tokens.forEach(token => {
console.log(` - Surface: ${token.surface_form}, POS: ${token.pos}, Reading: ${token.reading}`);
});
});