LangChain Text Splitters

1.1.1 · active · verified Sun Mar 29

LangChain Text Splitters (current version 1.1.1) provides a comprehensive set of utilities for breaking down large text documents into smaller, manageable chunks. This is crucial for applications like Retrieval-Augmented Generation (RAG) and fitting content within Language Model context windows. As an integral part of the LangChain ecosystem, it maintains an active and rapid release cadence, closely aligned with other LangChain libraries.

Warnings

Install

Imports

Quickstart

Demonstrates the basic usage of the `RecursiveCharacterTextSplitter`, the most commonly recommended text splitter. It shows how to initialize the splitter with `chunk_size` and `chunk_overlap`, and then split a long string into smaller text chunks.

from langchain_text_splitters import RecursiveCharacterTextSplitter

# Example long text
long_text = (
    "LangChain is a framework designed to simplify the creation of applications using large language models. "
    "It provides tools for chaining together different components, making it easier to build complex LLM workflows. "
    "Text splitting is a fundamental step in processing long documents for LLMs, ensuring that chunks fit within context windows and maintain semantic coherence. "
    "The RecursiveCharacterTextSplitter is often the recommended default for general-purpose text."
)

# Initialize the splitter
# chunk_size: maximum size of each chunk (in characters by default)
# chunk_overlap: number of characters to overlap between consecutive chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=100,
    chunk_overlap=20
)

# Split the text
chunks = text_splitter.split_text(long_text)

# Print the chunks
for i, chunk in enumerate(chunks):
    print(f"Chunk {i+1}:\n{chunk}\n---")

view raw JSON →