{"id":7095,"library":"colbert-ai","title":"ColBERT AI","description":"ColBERT (Contextualized Late Interaction over BERT) is an advanced neural information retrieval model that enables efficient and effective passage search over large text collections, leveraging fine-grained contextualized late interaction. The library is currently at version 0.2.22 and receives regular updates, focusing on performance, bug fixes, and broader compatibility.","status":"active","version":"0.2.22","language":"en","source_language":"en","source_url":"https://github.com/stanford-futuredata/ColBERT","tags":["information retrieval","nlp","embeddings","neural search","pytorch","transformers","reranking","rag"],"install":[{"cmd":"pip install colbert-ai","lang":"bash","label":"Minimal Install"},{"cmd":"pip install colbert-ai[torch,faiss-gpu]","lang":"bash","label":"Recommended for GPU (requires PyTorch & FAISS)"}],"dependencies":[{"reason":"Core deep learning framework for model operations.","package":"torch","optional":false},{"reason":"Integrates Hugging Face models and utilities.","package":"transformers","optional":false},{"reason":"Efficient similarity search for CPU (required if [faiss-cpu] extra is not used).","package":"faiss-cpu","optional":true},{"reason":"Efficient similarity search for GPU (highly recommended for performance, requires CUDA).","package":"faiss-gpu","optional":true}],"imports":[{"symbol":"ColBERTConfig","correct":"from colbert.infra import ColBERTConfig"},{"symbol":"RunConfig","correct":"from colbert.infra import RunConfig"},{"symbol":"Run","correct":"from colbert.infra import Run"},{"symbol":"Indexer","correct":"from colbert import Indexer"},{"symbol":"Searcher","correct":"from colbert import Searcher"},{"symbol":"Trainer","correct":"from colbert import Trainer"}],"quickstart":{"code":"import os\nfrom colbert.infra import ColBERTConfig, RunConfig, Run\nfrom colbert import Indexer, Searcher\n\n# Basic setup for running ColBERT\n# You might need to set up a dummy experiment directory\n# For real use, ensure a checkpoint exists or is downloaded\n# For example, download colbertv2.0 checkpoint via 'wget https://huggingface.co/colbert-ir/colbertv2.0/resolve/main/colbertv2.0.tar.gz'\n\n# A dummy collection and query for demonstration\ncollection = [\n    \"The quick brown fox jumps over the lazy dog.\",\n    \"Artificial intelligence is a rapidly evolving field.\",\n    \"Python is a popular programming language for AI and machine learning.\",\n    \"Machine learning is a subset of artificial intelligence.\"\n]\nqueries = [\"What is AI?\", \"Python programming\"]\n\n# Configure ColBERT\n# Replace 'colbert-ir/colbertv2.0' with a local path if downloaded\nCOLBERT_CHECKPOINT = os.environ.get('COLBERT_CHECKPOINT', 'colbert-ir/colbertv2.0')\nINDEX_ROOT = os.environ.get('COLBERT_INDEX_ROOT', 'experiments')\nINDEX_NAME = os.environ.get('COLBERT_INDEX_NAME', 'my_simple_index')\n\nwith Run().context(RunConfig(nranks=1, experiment='default')):\n    config = ColBERTConfig(checkpoint=COLBERT_CHECKPOINT)\n    \n    # 1. Indexing\n    indexer = Indexer(checkpoint=COLBERT_CHECKPOINT, config=config, root=INDEX_ROOT)\n    indexer.index(name=INDEX_NAME, collection=collection)\n    \n    # 2. Searching\n    searcher = Searcher(index=INDEX_NAME, config=config, collection=collection, root=INDEX_ROOT)\n    \n    print(f\"\\nSearching with query: '{queries[0]}'\")\n    results = searcher.search(queries[0], k=3)\n    for passage_id, rank, score in zip(*results):\n        print(f\"Passage ID: {passage_id}, Rank: {rank}, Score: {score:.2f}, Text: {collection[passage_id]}\")\n\n    print(f\"\\nSearching with query: '{queries[1]}'\")\n    results = searcher.search(queries[1], k=3)\n    for passage_id, rank, score in zip(*results):\n        print(f\"Passage ID: {passage_id}, Rank: {rank}, Score: {score:.2f}, Text: {collection[passage_id]}\")\n","lang":"python","description":"This quickstart demonstrates the basic workflow for indexing a small collection of passages and then performing a search using a pre-trained ColBERT model. It covers the `Indexer` for creating a ColBERT index and the `Searcher` for querying that index. Ensure a ColBERT checkpoint is available, either by letting the library download it or by providing a local path."},"warnings":[{"fix":"Upgrade `colbert-ai` to version 0.2.22 or newer: `pip install --upgrade colbert-ai`. If upgrading is not an option, downgrade `transformers` to a compatible version (e.g., `transformers==4.35.2`).","message":"The `AdamW` optimizer was removed from the `transformers` library in recent versions (e.g., v4.36+). Older versions of `colbert-ai` (prior to 0.2.22) that import `AdamW` directly from `transformers` will break.","severity":"breaking","affected_versions":"<0.2.22"},{"fix":"Consider setting up your environment with `conda` for `pytorch` and `faiss-gpu` to ensure optimal and stable installations, especially for GPU acceleration. Refer to the official ColBERT GitHub README for `conda` installation commands.","message":"Installing PyTorch and FAISS (especially `faiss-gpu`) via `pip` can sometimes lead to stability issues or incorrect CUDA configurations. The official ColBERT documentation often recommends using `conda` for these specific dependencies.","severity":"gotcha","affected_versions":"All"},{"fix":"Ensure CUDA drivers and `faiss-gpu` are correctly installed and configured. Monitor GPU memory usage during indexing. For very large collections, consider processing in batches or using more powerful hardware. Check for `nvcc` path errors if running on GPU.","message":"Indexing large collections can be memory and compute intensive, particularly without GPU acceleration. Failures can occur if CUDA is not correctly configured or if system resources are exhausted during the indexing process.","severity":"gotcha","affected_versions":"All"},{"fix":"Upgrade `colbert-ai` to version 0.2.22 or newer: `pip install --upgrade colbert-ai`.","message":"A bug in `loaders.py` related to regex handling could cause indexing failures with certain collection inputs.","severity":"bug","affected_versions":"<0.2.22"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Upgrade `colbert-ai` to v0.2.22 or later. Alternatively, downgrade your `transformers` library: `pip install transformers==4.35.2`.","cause":"The `AdamW` optimizer was removed from the `transformers` library in newer versions (e.g., v4.36+). Older `colbert-ai` versions directly importing it will fail.","error":"ImportError: cannot import name 'AdamW' from 'transformers'"},{"fix":"Ensure your CUDA toolkit is properly installed and that `nvcc` is accessible via your system's PATH environment variable. Verify `CUDA_HOME` is set correctly if applicable.","cause":"This typically indicates that the CUDA compiler (`nvcc`) is not found or is not correctly configured in your PATH, which is required for building FAISS extensions or other C++ components during indexing.","error":"ninja: build stopped: subcommand failed. Clustering X points in YD to Z clusters... /usr/local/cuda-X.Y/bin/nvcc: not found"},{"fix":"Delete the existing index directory (e.g., `rm -rf experiments/my_simple_index`) or choose a new, unique `index_name` for your indexing operation.","cause":"When running indexing, if a directory with the specified index name already exists and is not empty, `colbert-ai` will raise an `AssertionError` to prevent accidental overwrites.","error":"AssertionError: /path/to/existing/index. See translation."},{"fix":"Upgrade `colbert-ai` to version 0.2.22 or newer: `pip install --upgrade colbert-ai`. This version includes a fix for the regex handling.","cause":"This error, often originating from `colbert/indexing/loaders.py`, was caused by an incorrect regex flag when processing file paths or collection data in older versions.","error":"ValueError: Invalid pattern: '' can only be an entire path component"}]}