{"id":10221,"library":"sentence-stream","title":"Sentence Stream","description":"Sentence Stream is a small, pure Python library for splitting text into sentences. It is designed to work efficiently with text streams, such as large files or network streams, by processing text incrementally without loading the entire content into memory. The current version is 1.3.0, and it maintains an active release cadence for improvements and bug fixes.","status":"active","version":"1.3.0","language":"en","source_language":"en","source_url":"https://github.com/derekphilipau/sentence-stream","tags":["nlp","text processing","sentence splitter","stream","python3"],"install":[{"cmd":"pip install sentence-stream","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Required for execution","package":"python","version":">=3.9.0","optional":false}],"imports":[{"note":"Python module names use underscores (`_`) instead of hyphens (`-`).","wrong":"from sentence-stream import SentenceStream","symbol":"SentenceStream","correct":"from sentence_stream import SentenceStream"}],"quickstart":{"code":"from sentence_stream import SentenceStream\n\n# Example with a simple string\ntext_input = \"Hello world. This is a test. Another sentence.\\nNew paragraph. One more?\"\nstream = SentenceStream(text_input)\n\nprint(\"--- Processing string input ---\")\nfor sentence in stream:\n    print(f\"'{sentence}'\")\n\n# Example with a file-like object (simulate stream)\nimport io\nlong_text = \"This is the first sentence. And here is the second one. \" \\\n            \"The third sentence continues here. Finally, a fourth.\" \\\n            \"This could be a very large file.\" * 10\n\nfile_stream = io.StringIO(long_text)\nstream_from_file = SentenceStream(file_stream)\n\nprint(\"\\n--- Processing file-like object ---\")\nsentences_count = 0\nfor sentence in stream_from_file:\n    # print(f\"'{sentence}'\") # Uncomment to see all sentences\n    sentences_count += 1\nprint(f\"Processed {sentences_count} sentences from stream.\")\n","lang":"python","description":"Initialize `SentenceStream` with a string or an iterable (like a file-like object). The instance is itself an iterable, yielding sentences one by one."},"warnings":[{"fix":"For complex NLP tasks, consider libraries like spaCy, NLTK, or Hugging Face Transformers, which offer more sophisticated sentence boundary detection algorithms.","message":"This library provides a rule-based sentence splitter and is not intended for advanced Natural Language Processing (NLP) tokenization that requires deep linguistic understanding or model-based analysis. It focuses on basic punctuation-driven splitting.","severity":"gotcha","affected_versions":"All versions"},{"fix":"For very large texts, provide input as an iterable, such as a file opened in text mode (`open('large_file.txt', 'r', encoding='utf-8')`), or a custom generator that yields text chunks.","message":"While `SentenceStream` accepts a single string as input, its primary performance benefit comes from processing actual input streams (iterables that yield chunks of text). If you pass a very large single string, the library will still buffer it internally before processing, potentially negating some of the streaming advantages for memory.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Review the output for edge cases in your specific text data. For non-English languages or highly specialized texts, consider pre-processing or using language-specific tools.","message":"The library primarily uses standard English punctuation rules. While robust for many cases, it may not perfectly handle highly ambiguous punctuation, abbreviations, or specific linguistic nuances across all languages without explicit configuration or custom rules.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-17T00:00:00.000Z","next_check":"2026-07-16T00:00:00.000Z","problems":[{"fix":"Change the import statement to `from sentence_stream import SentenceStream`.","cause":"Incorrect import statement due to using hyphens instead of underscores in the module name.","error":"ModuleNotFoundError: No module named 'sentence-stream'"},{"fix":"Ensure the input to `SentenceStream` is either a string or an iterable that yields strings (e.g., a file-like object).","cause":"Passing `None` or an unexpected non-string, non-iterable object to `SentenceStream`'s constructor.","error":"TypeError: 'NoneType' object is not iterable"},{"fix":"The `SentenceStream` object itself is an iterable. To get sentences, simply iterate over the instance: `for sentence in my_stream: ...`.","cause":"Attempting to call a method that doesn't exist on the `SentenceStream` object, possibly confusing it with traditional string methods or file objects.","error":"AttributeError: 'SentenceStream' object has no attribute 'read' (or 'split_text', etc.)"}]}