Nagisa: Japanese Tokenizer and POS Tagger
Nagisa is a Python module for Japanese word segmentation and Part-of-Speech (POS) tagging. It is built upon recurrent neural networks, leveraging both character- and word-level features for segmentation and tag dictionary information for POS tagging. Designed to be simple and easy to use, the library is actively maintained with version 0.2.12 as of February 2026, receiving periodic updates to address bugs and improve performance.
Common errors
-
Poetry could not find a compatible version for package nagisa
cause Versions prior to 0.2.11 had incorrect or missing dependency metadata (`Requires-Dist`) in the `tar.gz` files on PyPI, which Poetry relies on.fixUpgrade `nagisa` to 0.2.11 or newer: `pip install nagisa==0.2.12`. If you must use an older version, ensure your Poetry environment's `pip`, `wheel`, and `build` packages are fully up-to-date, or consider installing `nagisa` and its direct dependencies (`six`, `numpy`, `DyNet38`) manually with `pip` first. -
AttributeError: module 'utils' has no attribute 'OOV'
cause In `nagisa` versions before 0.2.7, an internal module was named `utils.pyx`, which could conflict with other `utils` modules in the Python path or cause import errors.fixUpgrade `nagisa` to version 0.2.7 or later: `pip install nagisa==0.2.12`. This version renamed the internal file to `nagisa_utils.pyx` to avoid conflicts. -
ImportError: cannot import name 'DyNet' from 'dynet' (or similar DyNet build failures)
cause `DyNet` (or `DyNet38`) is a complex C++-backed library that `nagisa` depends on. Installation can fail if pre-built wheels are not available for your specific Python version and OS, or if build tools are missing.fixFirst, ensure you are using a Python version officially supported by the latest `nagisa` release (check PyPI for supported Python versions like 3.9-3.14). Try `pip install DyNet38` independently. If this fails, ensure you have necessary C++ build tools (e.g., Xcode Command Line Tools on macOS, Visual C++ Build Tools on Windows, `build-essential` on Linux). If problems persist, consider installing `DyNet` directly from its GitHub repository following its specific build instructions, then install `nagisa`. -
Some words are missing from the tokenized output, especially at the end of the text.
cause A bug in `nagisa` versions prior to 0.2.12 caused words to be silently dropped if the input text ended with an incomplete word structure detected by the tokenizer.fixUpgrade `nagisa` to version 0.2.12 or newer: `pip install nagisa==0.2.12`.
Warnings
- breaking Versions of `nagisa` prior to 0.2.11 had issues with Poetry installations due to missing `Requires-Dist` metadata (for `six`, `numpy`, and `DyNet`/`DyNet38`) in the PyPI `tar.gz` files.
- gotcha In versions prior to 0.2.10, importing `nagisa` on Linux with Python 3.8 and above would often print verbose `DyNet` logging messages to the console during initialization.
- gotcha Version 0.2.12 fixed an issue where `nagisa` would silently drop certain words from tokenization results if the input text ended with a partially formed word (i.e., a sequence starting with a BEGIN tag but missing an END tag).
- breaking Versions prior to 0.2.7 could raise an `AttributeError: module 'utils' has no attribute 'OOV'` or similar, due to a naming conflict with the internal `utils.pyx` file.
- gotcha Older versions of `nagisa` (especially before widespread `DyNet38` wheels) could be difficult to install on Python 3.8+ due to `DyNet`'s lack of pre-built wheels for these newer Python versions.
Install
-
pip install nagisa
Imports
- nagisa
import nagisa
- nagisa.tagging
tagger = nagisa.Tagger(); words = tagger.tagging(text)
words = nagisa.tagging(text)
Quickstart
import nagisa text = 'Pythonで簡単に使えるツールです' # Perform word segmentation and POS tagging words = nagisa.tagging(text) print(words) # => Python/名詞 で/助詞 簡単/形状詞 に/助動詞 使える/動詞 ツール/名詞 です/助動詞 print(words.words) # => ['Python', 'で', '簡単', 'に', '使える', 'ツール', 'です'] print(words.postags) # => ['名詞', '助詞', '形状詞', '助動詞', '動詞', '名詞', '助動詞'] # Example of post-processing: extract only nouns nouns = nagisa.extract(text, extract_postags=['名詞']) print(nouns) # => Python/名詞 ツール/名詞