MiniSBD: Fast Sentence Boundary Detection
MiniSBD is a free and open-source Python library designed for fast and efficient sentence boundary detection (SBD). It provides a lightweight solution for splitting text into sentences, supporting various punctuation and language patterns. The current version is 0.9.5, with releases occurring periodically, often driven by improvements in tokenization or punctuation handling.
Common errors
-
ModuleNotFoundError: No module named 'minisbd'
cause The minisbd package is not installed in your current Python environment.fixRun `pip install minisbd` to install the library. -
AttributeError: 'SBD' object has no attribute 'segments'
cause You are attempting to call a method named `segments`, but the correct method for sentence segmentation is `segment` (singular).fixChange your code from `sbd.segments(text)` to `sbd.segment(text)`.
Warnings
- gotcha Initializing the `SBD` object is an operation that should ideally be done once. Re-initializing it inside a loop will incur unnecessary overhead and degrade performance.
- gotcha MiniSBD is optimized for speed and general English text. While robust, its accuracy might vary for highly specialized domains, informal text (e.g., social media), or languages with very different sentence boundary rules than common Western languages. It may not handle all edge cases perfectly.
Install
-
pip install minisbd
Imports
- SBD
import minisbd; sbd = minisbd.SBD()
from minisbd import SBD
Quickstart
from minisbd import SBD
sbd = SBD() # Initialize the SBD object once
text1 = "Hello world. This is a test. Is it working?"
sentences1 = sbd.segment(text1)
print(f"Text 1: {sentences1}")
text2 = "Hello world! This is another test. Is it working now?"
sentences2 = sbd.segment(text2)
print(f"Text 2: {sentences2}")