ColPali Engine

raw JSON →
0.3.15 verified Mon Apr 27 auth: no python

ColPali Engine is a library for training and running inference with the ColPali architecture, a multimodal retrieval model based on vision-language models. It supports document indexing and retrieval using late interaction over image embeddings. Current version is 0.3.15, with active development and periodic releases.

pip install colpali-engine
error ModuleNotFoundError: No module named 'colpali_engine'
cause The package has not been installed or is installed under the wrong name (e.g., 'colpali' instead of 'colpali-engine').
fix
Install via 'pip install colpali-engine' and not 'pip install colpali'.
error AttributeError: module 'colpali_engine' has no attribute 'ColPaliModel'
cause Importing from the wrong top-level module; ColPaliModel is in 'colpali_engine.models'.
fix
Use 'from colpali_engine.models import ColPaliModel'.
error ValueError: The model 'vidore/colpali-v1.2' does not have a processor class
cause Using an outdated model name or the processor is not compatible; ensure both model and processor are from same repo.
fix
Use 'processor = ColPaliProcessor.from_pretrained(model_name)' where model_name is from Hugging Face.
error RuntimeError: CUDA out of memory
cause The model requires more GPU memory than available; batch size too large or using full precision.
fix
Reduce batch size, use 'torch_dtype=torch.float16' or 'torch.bfloat16', or offload to CPU with 'device_map="auto"'.
breaking Version 0.3.0 removed the 'ColPali' class and renamed the model class to 'ColPaliModel'. Code using 'from colpali_engine import ColPali' will break.
fix Use 'from colpali_engine.models import ColPaliModel' instead.
deprecated The method 'ColPaliModel.forward()' has been deprecated in favor of directly calling the model object (__call__) or using 'model.generate()' for generation tasks.
fix Replace 'model.forward(inputs)' with 'model(inputs)' or 'model.generate(inputs)' for text generation.
gotcha GPU vs CPU: ColPali models require significant GPU memory. Running on CPU may be extremely slow. Always check device availability.
fix Use 'device_map="cuda"' if GPU available, or set 'device_map="auto"' for automatic mapping.
gotcha The processor expects images in PIL format or file paths. Passing raw numpy arrays may cause errors.
fix Ensure images are loaded via PIL.Image.open() or processor.image_processor.convert_to_rgb() before processing.

Basic retrieval using ColPali: load model, index documents, and search.

import torch
from colpali_engine.models import ColPaliModel, ColPaliProcessor
from colpali_engine.retrieval import ColPaliRetriever

model_name = "vidore/colpali-v1.2"
model = ColPaliModel.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="cuda" if torch.cuda.is_available() else "cpu")
processor = ColPaliProcessor.from_pretrained(model_name)

# Example documents (can be images or text)
docs = [
    "A diagram of the ColPali architecture",
    "Another document with text"
]

query = "ColPali architecture"

# Process and index documents
doc_embeddings = []
for doc in docs:
    with torch.no_grad():
        processed = processor.process_images([doc])
        embeddings = model(**processed.to(model.device))
        doc_embeddings.append(embeddings)

# Index with retriever
retriever = ColPaliRetriever(model, processor)
retriever.index(doc_embeddings)

# Search
results = retriever.search(query, k=2)
print(results)