segtok: Sentence Segmentation and Word Tokenization

JSON →
library 1.5.11 ·python maintenance
verified May 22, 2026

Segtok is a fast, rule-based Python library for sentence segmentation and word tokenization. It is designed for well-orthographed texts, particularly in English, German, and Romance languages, offering high precision and Unicode support. The current version is 1.5.11. While functional, it is largely superseded by 'syntok' (segtok v2) which offers improved performance and handles more edge cases. It is in a maintenance phase with no active development.

total hits 14
actors 5 distinct systems
last hit 2d ago AhrefsBot
GPTBot
6
Script
2
ClaudeBot
1
Search engines
1

top countries 🇺🇸 United States · 🇨🇦 Canada · 🇩🇪 Germany