Sentence Stream

1.3.0 · active · verified Fri Apr 17

Sentence Stream is a small, pure Python library for splitting text into sentences. It is designed to work efficiently with text streams, such as large files or network streams, by processing text incrementally without loading the entire content into memory. The current version is 1.3.0, and it maintains an active release cadence for improvements and bug fixes.

Common errors

Warnings

Install

Imports

Quickstart

Initialize `SentenceStream` with a string or an iterable (like a file-like object). The instance is itself an iterable, yielding sentences one by one.

from sentence_stream import SentenceStream

# Example with a simple string
text_input = "Hello world. This is a test. Another sentence.\nNew paragraph. One more?"
stream = SentenceStream(text_input)

print("--- Processing string input ---")
for sentence in stream:
    print(f"'{sentence}'")

# Example with a file-like object (simulate stream)
import io
long_text = "This is the first sentence. And here is the second one. " \
            "The third sentence continues here. Finally, a fourth." \
            "This could be a very large file." * 10

file_stream = io.StringIO(long_text)
stream_from_file = SentenceStream(file_stream)

print("\n--- Processing file-like object ---")
sentences_count = 0
for sentence in stream_from_file:
    # print(f"'{sentence}'") # Uncomment to see all sentences
    sentences_count += 1
print(f"Processed {sentences_count} sentences from stream.")

view raw JSON →