PyThaiNLP

5.3.4 · active · verified Sat Apr 11

PyThaiNLP is a Python library for natural language processing (NLP) of the Thai language. It provides standard NLP functions like word and sentence segmentation, part-of-speech tagging, transliteration, and various utilities. The library is actively maintained, with version 5.3.4 as the current stable release, and new minor updates for the 5.x series are still being released, with a major 6.0 release expected to introduce breaking changes.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates basic word and sentence tokenization using PyThaiNLP's default engine. Many other tokenization engines are available and can be specified with the `engine` parameter (e.g., `engine="icu"`).

from pythainlp.tokenize import word_tokenize

text = "ฉันรักภาษาไทย"
tokens = word_tokenize(text)
print(tokens)
# Output example: ['ฉัน', 'รัก', 'ภาษาไทย']

sentences = sent_tokenize("สวัสดีครับ. สบายดีไหมครับ?")
print(sentences)
# Output example: ['สวัสดีครับ.', 'สบายดีไหมครับ?']

view raw JSON →