Jieba3k Chinese Word Segmentation Utility
jieba3k is a Python library designed for Chinese word segmentation, aiming for Python 3 compatibility. It provides utilities to accurately cut Chinese sentences into individual words using various segmentation modes. While it served as an early Python 3 compatible version of the popular Jieba segmenter, it is now considered outdated. The current version is 0.35.1, last released in November 2014, and it does not have an active development or regular release cadence.
Warnings
- breaking The `jieba3k` library is largely deprecated and is not actively maintained. Its last release was in November 2014. Users are strongly advised to use the actively maintained `jieba` library (pip install jieba) instead, which is compatible with Python 3 and receives regular updates.
- gotcha Installing `jieba3k` will provide the `jieba` module, potentially overwriting or conflicting with an existing installation of the actively maintained `jieba` library. This can lead to unexpected behavior or version downgrades for `jieba`.
- gotcha Due to its age and lack of maintenance, `jieba3k` may have limited compatibility with newer Python versions (beyond Python 3.4/3.5 era) and might contain unpatched bugs or performance issues.
Install
-
pip install jieba3k
Imports
- jieba
import jieba
Quickstart
import jieba
text = "我爱北京天安门"
# Default: Accurate Mode
seg_list_accurate = jieba.lcut(text, cut_all=False)
print("Accurate Mode:", seg_list_accurate)
# Full Mode
seg_list_full = jieba.lcut(text, cut_all=True)
print("Full Mode:", seg_list_full)
# Search Engine Mode
seg_list_search = jieba.lcut_for_search(text)
print("Search Engine Mode:", seg_list_search)