segtok: Sentence Segmentation and Word Tokenization

library 1.5.11 ·python maintenance

✓ verified Jun 28, 2026

Segtok is a fast, rule-based Python library for sentence segmentation and word tokenization. It is designed for well-orthographed texts, particularly in English, German, and Romance languages, offering high precision and Unicode support. The current version is 1.5.11. While functional, it is largely superseded by 'syntok' (segtok v2) which offers improved performance and handles more edge cases. It is in a maintenance phase with no active development.

Traffic · last 30 days ↓80% vs prev 7d · indexed Sun Apr 12 · updated Sat Jul 11

total hits 17

actors 4 distinct systems

last hit 6d ago AhrefsBot

ByteDance

GPTBot

Script

Humans

top countries 🇸🇬 Singapore · 🇺🇸 United States · 🇨🇦 Canada · 🇩🇪 Germany

Resources

githubgithub.com/fnl/segtok ↗

packagepypi.org/project/segtok/ ↗

API endpoints

full doc /v1/registry/segtok

install /v1/registry/segtok/install

compatibility /v1/registry/segtok/compatibility