Kuromoji.js

0.1.2 · abandoned · verified Sun Apr 19

Kuromoji.js is a JavaScript implementation of a Japanese morphological analyzer, directly ported from the Java-based Kuromoji project. It provides functionality to tokenize Japanese text into its constituent words (morphemes) and extract detailed information such as part-of-speech tags, base forms, readings (pronunciation in Katakana), and surface forms. The package's current stable version is `0.1.2`, and its last known publication date was approximately eight years ago (around March 2018), indicating that the original project is largely abandoned or unmaintained. It primarily supports CommonJS modules for Node.js environments and global script inclusion for browsers, relying exclusively on callback-based asynchronous operations for dictionary loading. Due to its age, it lacks modern JavaScript features like ESM support, TypeScript definitions, and Promise-based APIs. Developers seeking these modern capabilities should consider using actively maintained forks such as `@patdx/kuromoji` or `code4fukui-es`, which offer updated architectures and features.

Common errors

Warnings

Install

Imports

Quickstart

This example demonstrates how to initialize the Kuromoji.js tokenizer in Node.js, specifying the dictionary path, and then tokenize a Japanese sentence.

const path = require('path');
const kuromoji = require('kuromoji');

const dicPath = path.resolve(__dirname, 'node_modules/kuromoji/dict');

kuromoji.builder({ dicPath: dicPath }).build(function (err, tokenizer) {
    if (err) {
        console.error('Error building tokenizer:', err);
        return;
    }
    const sentence = "すもももももももものうち";
    const tokens = tokenizer.tokenize(sentence);
    console.log(`Tokens for "${sentence}":`);
    tokens.forEach(token => {
        console.log(`  - Surface: ${token.surface_form}, POS: ${token.pos}, Reading: ${token.reading}`);
    });
});

view raw JSON →