Jieba3k Chinese Word Segmentation Utility

0.35.1 · deprecated · verified Mon Apr 13

jieba3k is a Python library designed for Chinese word segmentation, aiming for Python 3 compatibility. It provides utilities to accurately cut Chinese sentences into individual words using various segmentation modes. While it served as an early Python 3 compatible version of the popular Jieba segmenter, it is now considered outdated. The current version is 0.35.1, last released in November 2014, and it does not have an active development or regular release cadence.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates basic Chinese word segmentation using Jieba3k's provided 'jieba' module. It showcases the accurate, full, and search engine segmentation modes. The `lcut` function returns a list of segmented words.

import jieba

text = "我爱北京天安门"

# Default: Accurate Mode
seg_list_accurate = jieba.lcut(text, cut_all=False)
print("Accurate Mode:", seg_list_accurate)

# Full Mode
seg_list_full = jieba.lcut(text, cut_all=True)
print("Full Mode:", seg_list_full)

# Search Engine Mode
seg_list_search = jieba.lcut_for_search(text)
print("Search Engine Mode:", seg_list_search)

view raw JSON →