LlamaIndex BM25 Retriever

0.7.1 · active · verified Thu Apr 16

This library provides the BM25Retriever integration for LlamaIndex, enabling efficient keyword-based retrieval of documents. It is part of the modular LlamaIndex ecosystem (v0.10.0+) and is released as a separate package. The current version is 0.7.1, with updates typically aligning with LlamaIndex core releases.

Common errors

Warnings

Install

Imports

Quickstart

This example demonstrates how to initialize `BM25Retriever` using `SimpleDirectoryReader` to load documents and then perform a retrieval query. It showcases direct initialization from a list of `Document` objects.

from llama_index.retrievers.bm25 import BM25Retriever
from llama_index.core import SimpleDirectoryReader, Document
import os

# Create a dummy data directory and file for demonstration
os.makedirs('data', exist_ok=True)
with open('data/test_document.txt', 'w') as f:
    f.write('The quick brown fox jumps over the lazy dog. Dogs are often lazy.')
    f.write('\nCats are also animals, but they are not mentioned here.')

# load documents
documents = SimpleDirectoryReader(input_files=["data/test_document.txt"]).load_data()

# Initialize BM25 retriever directly from documents
retriever = BM25Retriever.from_defaults(
    documents=documents,
    similarity_top_k=2
)

# Retrieve nodes based on a query
nodes = retriever.retrieve("What animal is lazy?")

for node in nodes:
    print(f"Content: {node.get_content()}\nScore: {node.get_score()}\n---")

# Clean up dummy file
os.remove('data/test_document.txt')
os.rmdir('data')

view raw JSON →