pysbd (Python Sentence Boundary Disambiguation)

0.3.4 · active · verified Sat Apr 11

pysbd (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection library that works out-of-the-box across many languages. It aims to provide accurate sentence segmentation even with complex text, abbreviations, and varied punctuation, offering an alternative to neural network-based approaches. The current version is 0.3.4, and the project appears to be actively maintained.

Warnings

Install

Imports

Quickstart

This example demonstrates basic sentence segmentation using the `Segmenter` class for English text. The `clean=False` parameter is used to prevent aggressive text cleaning.

import pysbd

text = "Dr. Smith went to the U.S. last week. He said, 'Hello!' How are you?"

# Initialize segmenter for English
segmenter = pysbd.Segmenter(language="en", clean=False)

sentences = segmenter.segment(text)

for i, sent in enumerate(sentences):
    print(f"Sentence {i+1}: {sent}")

view raw JSON →