LlamaIndex BM25 Retriever
This library provides the BM25Retriever integration for LlamaIndex, enabling efficient keyword-based retrieval of documents. It is part of the modular LlamaIndex ecosystem (v0.10.0+) and is released as a separate package. The current version is 0.7.1, with updates typically aligning with LlamaIndex core releases.
Common errors
-
ModuleNotFoundError: No module named 'llama_index.retrievers.bm25'
cause The `llama-index-retrievers-bm25` package is not installed, or you are trying to use it with an older `llama-index` core version that had a different module structure.fixRun `pip install llama-index-retrievers-bm25`. If already installed, ensure your `llama-index-core` is v0.10.0+. -
AttributeError: 'VectorStoreIndex' object has no attribute 'docstore'
cause Attempting to initialize `BM25Retriever.from_defaults(index=my_index)` or similar. The `index` parameter is not directly accepted, and the example might be misleading.fixInitialize with `documents=my_index.documents` (if available) or `docstore=my_index.docstore` if you have an existing index. Otherwise, pass the raw `documents` directly. -
TypeError: from_defaults() got an unexpected keyword argument 'service_context'
cause Attempting to pass `service_context` to `from_defaults`. LlamaIndex v0.10.0+ significantly reduced the reliance on `ServiceContext` for basic component initialization.fixRemove the `service_context` argument. Configure components directly or pass relevant parameters (like `llm`, `embed_model`) if the component specifically accepts them.
Warnings
- breaking LlamaIndex core underwent a major refactor in v0.10.0, moving integrations like BM25 into separate packages and changing core APIs. This `llama-index-retrievers-bm25` package is designed for LlamaIndex v0.10.0 and newer.
- gotcha The `BM25Retriever.from_defaults` method expects either a list of `Document` objects or a `Docstore` object. It cannot be directly initialized from a `VectorStoreIndex` without extracting its `docstore`.
- gotcha This package (`llama-index-retrievers-bm25`) does not automatically install `llama-index-core`. While it's an integration, `llama-index-core` is a peer dependency for almost all practical uses.
Install
-
pip install llama-index-retrievers-bm25
Imports
- BM25Retriever
from llama_index.indices.query.retrievers.bm25_retriever import BM25Retriever
from llama_index.retrievers.bm25 import BM25Retriever
Quickstart
from llama_index.retrievers.bm25 import BM25Retriever
from llama_index.core import SimpleDirectoryReader, Document
import os
# Create a dummy data directory and file for demonstration
os.makedirs('data', exist_ok=True)
with open('data/test_document.txt', 'w') as f:
f.write('The quick brown fox jumps over the lazy dog. Dogs are often lazy.')
f.write('\nCats are also animals, but they are not mentioned here.')
# load documents
documents = SimpleDirectoryReader(input_files=["data/test_document.txt"]).load_data()
# Initialize BM25 retriever directly from documents
retriever = BM25Retriever.from_defaults(
documents=documents,
similarity_top_k=2
)
# Retrieve nodes based on a query
nodes = retriever.retrieve("What animal is lazy?")
for node in nodes:
print(f"Content: {node.get_content()}\nScore: {node.get_score()}\n---")
# Clean up dummy file
os.remove('data/test_document.txt')
os.rmdir('data')