{"id":1690,"library":"rank-bm25","title":"BM25 Ranking Algorithms","description":"Provides various BM25 algorithms (BM25Okapi, BM25L, BM25Plus) for document ranking based on a corpus of tokenized documents. It's currently at version 0.2.2 and appears to have an infrequent release cadence, with the latest update adding support for non-iterable corpuses.","status":"active","version":"0.2.2","language":"en","source_language":"en","source_url":"https://github.com/dorianbrown/rank_bm25","tags":["ranking","search","nlp","bm25","information-retrieval"],"install":[{"cmd":"pip install rank-bm25","lang":"bash","label":"Install stable version"}],"dependencies":[],"imports":[{"note":"The PyPI package is 'rank-bm25' but the module name for import is 'rank_bm25' (with an underscore).","wrong":"from rankbm25 import BM25Okapi","symbol":"BM25Okapi","correct":"from rank_bm25 import BM25Okapi"},{"note":"The PyPI package is 'rank-bm25' but the module name for import is 'rank_bm25' (with an underscore).","wrong":"from rankbm25 import BM25L","symbol":"BM25L","correct":"from rank_bm25 import BM25L"},{"note":"The PyPI package is 'rank-bm25' but the module name for import is 'rank_bm25' (with an underscore).","wrong":"from rankbm25 import BM25Plus","symbol":"BM25Plus","correct":"from rank_bm25 import BM25Plus"}],"quickstart":{"code":"from rank_bm25 import BM25Okapi\n\ncorpus = [\n    \"Hello there, this is a document.\",\n    \"This document is about BM25.\",\n    \"Hello, how are you today?\",\n    \"BM25 is a ranking algorithm.\",\n]\n\n# Tokenize the corpus (essential step)\ntokenized_corpus = [doc.lower().split(\" \") for doc in corpus]\n\nbm25 = BM25Okapi(tokenized_corpus)\n\nquery = \"BM25 ranking algorithm\"\ntokenized_query = query.lower().split(\" \")\n\ndoc_scores = bm25.get_scores(tokenized_query)\nprint(f\"Document scores: {doc_scores}\")\n\ntop_n = bm25.get_top_n(tokenized_query, corpus, n=2)\nprint(f\"Top 2 documents: {top_n}\")\n","lang":"python","description":"This example demonstrates how to initialize BM25Okapi with a tokenized corpus and then retrieve scores and top-N documents for a given tokenized query."},"warnings":[{"fix":"Ensure your corpus (list of documents) and queries are tokenized into lists of words (e.g., using `doc.split(' ')` or an NLP tokenizer) before passing them to the BM25 constructor or `get_scores`/`get_top_n` methods.","message":"The BM25 algorithms expect a pre-tokenized corpus (a list of lists of strings) and tokenized queries, not raw strings. Each sub-list represents a document's tokens.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Use `pip install rank-bm25` for installation and `from rank_bm25 import ...` for importing classes like `BM25Okapi`.","message":"The PyPI package name for installation is `rank-bm25` (with a hyphen), but the Python module you import into your code is `rank_bm25` (with an underscore).","severity":"gotcha","affected_versions":"All versions"},{"fix":"For versions prior to 0.2.2, ensure your corpus is a concrete list of tokenized documents. For 0.2.2+, generators are supported, but remember they are single-pass and may need to be re-initialized if reused.","message":"Prior to version 0.2.2, passing non-iterable corpuses (e.g., generators) to the BM25 constructor was not officially supported and could lead to unexpected behavior or errors. While 0.2.2 added support, be mindful of generator behavior.","severity":"gotcha","affected_versions":"< 0.2.2"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}