LangChain Text Splitters
raw JSON → 1.1.1 verified Tue May 12 auth: no python install: verified
LangChain Text Splitters (current version 1.1.1) provides a comprehensive set of utilities for breaking down large text documents into smaller, manageable chunks. This is crucial for applications like Retrieval-Augmented Generation (RAG) and fitting content within Language Model context windows. As an integral part of the LangChain ecosystem, it maintains an active and rapid release cadence, closely aligned with other LangChain libraries.
pip install langchain-text-splitters Common errors
error ModuleNotFoundError: No module named 'langchain.text_splitter' OR ImportError: cannot import name 'RecursiveCharacterTextSplitter' from 'langchain.text_splitter' ↓
cause The `RecursiveCharacterTextSplitter` and other text splitters have been moved from the `langchain` package to the dedicated `langchain_text_splitters` package as part of LangChain's modularization.
fix
First, ensure
langchain-text-splitters is installed: pip install -U langchain-text-splitters. Then, update your import statement: from langchain_text_splitters import RecursiveCharacterTextSplitter. error ModuleNotFoundError: No module named 'langchain_text_splitters' ↓
cause The `langchain-text-splitters` package, which contains the text splitting utilities, has not been installed in your Python environment.
fix
Install the package using pip:
pip install langchain-text-splitters. error AttributeError: module 'langchain.text_splitter' has no attribute 'RecursiveCharacterTextSplitter' ↓
cause This error occurs when an older version of LangChain is installed, or the `langchain-text-splitters` package is not correctly referenced, leading to the `RecursiveCharacterTextSplitter` class not being found in the `langchain.text_splitter` module.
fix
Ensure you have
langchain-text-splitters installed and are importing from the correct module: pip install -U langchain-text-splitters and then from langchain_text_splitters import RecursiveCharacterTextSplitter. error ImportError: cannot import name 'RegexTextSplitter' from 'langchain.text_splitter' ↓
cause The `RegexTextSplitter` class has been deprecated and its functionality is now integrated into `RecursiveCharacterTextSplitter` using the `is_separator_regex` parameter.
fix
Use
RecursiveCharacterTextSplitter and set is_separator_regex=True with your regular expression separators: from langchain_text_splitters import RecursiveCharacterTextSplitter; text_splitter = RecursiveCharacterTextSplitter(separators=[r'\n\n', r'\n'], is_separator_regex=True). Warnings
breaking The text splitter modules have been moved from `langchain.text_splitter` to the standalone `langchain-text-splitters` package. Direct imports from `langchain.text_splitter` will no longer work. ↓
fix Update all text splitter imports from `from langchain.text_splitter import ...` to `from langchain_text_splitters import ...`.
gotcha When using `create_documents()` method, it expects a *list* of strings (or `Document` objects). Passing a single string will result in each character being treated as a separate document. ↓
fix For a single string, use `text_splitter.split_text(your_string)`. If you intend to pass multiple strings to `create_documents`, ensure they are wrapped in a list: `text_splitter.create_documents([your_string])`.
gotcha The `chunk_size` parameter for character-based splitters specifies the *target* maximum chunk size. Due to the splitter's logic (e.g., trying to split on specific separators first), the actual chunk length may not be exactly `chunk_size`. ↓
fix Understand that `chunk_size` is a guideline. For more precise length control (e.g., token-based), consider `TokenTextSplitter` or custom `length_function` with an appropriate tokenizer.
gotcha Mixing major versions of LangChain ecosystem packages (e.g., `langchain-text-splitters==1.x.x` with `langchain-core==0.3.x`) can lead to compatibility issues and unexpected behavior. ↓
fix Always strive to keep all `langchain-` prefixed packages within the same major version series (e.g., all `1.x.x` or all `0.3.x`) to ensure compatibility.
gotcha Some specialized splitters, like `MarkdownHeaderTextSplitter` and `HTMLHeaderTextSplitter`, do not inherit from the base `TextSplitter` class. This means they might have slightly different method signatures or expectations. ↓
fix Always consult the specific documentation or API reference for specialized text splitters to understand their unique behavior and interfaces.
Install compatibility verified last tested: 2026-05-12
python os / libc status wheel install import disk
3.10 alpine (musl) wheel - 1.37s 63.5M
3.10 alpine (musl) - - 1.40s 62.7M
3.10 slim (glibc) wheel 7.6s 1.01s 72M
3.10 slim (glibc) - - 0.94s 71M
3.11 alpine (musl) wheel - 1.63s 68.8M
3.11 alpine (musl) - - 1.80s 67.8M
3.11 slim (glibc) wheel 6.6s 1.46s 77M
3.11 slim (glibc) - - 1.38s 76M
3.12 alpine (musl) wheel - 1.83s 59.9M
3.12 alpine (musl) - - 1.90s 58.9M
3.12 slim (glibc) wheel 5.4s 1.80s 68M
3.12 slim (glibc) - - 1.82s 67M
3.13 alpine (musl) wheel - 1.62s 59.6M
3.13 alpine (musl) - - 1.84s 58.5M
3.13 slim (glibc) wheel 5.5s 1.68s 68M
3.13 slim (glibc) - - 1.73s 67M
3.9 alpine (musl) wheel - 1.77s 60.3M
3.9 alpine (musl) - - 1.70s 60.3M
3.9 slim (glibc) wheel 8.2s 1.65s 68M
3.9 slim (glibc) - - 1.55s 68M
Imports
- RecursiveCharacterTextSplitter wrong
from langchain.text_splitter import RecursiveCharacterTextSplittercorrectfrom langchain_text_splitters import RecursiveCharacterTextSplitter - CharacterTextSplitter
from langchain_text_splitters import CharacterTextSplitter - MarkdownHeaderTextSplitter
from langchain_text_splitters import MarkdownHeaderTextSplitter
Quickstart last tested: 2026-04-24
from langchain_text_splitters import RecursiveCharacterTextSplitter
# Example long text
long_text = (
"LangChain is a framework designed to simplify the creation of applications using large language models. "
"It provides tools for chaining together different components, making it easier to build complex LLM workflows. "
"Text splitting is a fundamental step in processing long documents for LLMs, ensuring that chunks fit within context windows and maintain semantic coherence. "
"The RecursiveCharacterTextSplitter is often the recommended default for general-purpose text."
)
# Initialize the splitter
# chunk_size: maximum size of each chunk (in characters by default)
# chunk_overlap: number of characters to overlap between consecutive chunks
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=100,
chunk_overlap=20
)
# Split the text
chunks = text_splitter.split_text(long_text)
# Print the chunks
for i, chunk in enumerate(chunks):
print(f"Chunk {i+1}:\n{chunk}\n---")