Nagisa: Japanese Tokenizer and POS Tagger

0.2.12 · active · verified Thu Apr 16

Nagisa is a Python module for Japanese word segmentation and Part-of-Speech (POS) tagging. It is built upon recurrent neural networks, leveraging both character- and word-level features for segmentation and tag dictionary information for POS tagging. Designed to be simple and easy to use, the library is actively maintained with version 0.2.12 as of February 2026, receiving periodic updates to address bugs and improve performance.

Common errors

Warnings

Install

Imports

Quickstart

This example demonstrates how to perform basic Japanese word segmentation and Part-of-Speech tagging using the `nagisa.tagging()` function, and how to access the segmented words and their POS tags. It also shows a simple post-processing step to extract words of a specific POS tag.

import nagisa

text = 'Pythonで簡単に使えるツールです'

# Perform word segmentation and POS tagging
words = nagisa.tagging(text)

print(words) # => Python/名詞 で/助詞 簡単/形状詞 に/助動詞 使える/動詞 ツール/名詞 です/助動詞
print(words.words) # => ['Python', 'で', '簡単', 'に', '使える', 'ツール', 'です']
print(words.postags) # => ['名詞', '助詞', '形状詞', '助動詞', '動詞', '名詞', '助動詞']

# Example of post-processing: extract only nouns
nouns = nagisa.extract(text, extract_postags=['名詞'])
print(nouns) # => Python/名詞 ツール/名詞

view raw JSON →