ColPali Engine
raw JSON → 0.3.15 verified Mon Apr 27 auth: no python
ColPali Engine is a library for training and running inference with the ColPali architecture, a multimodal retrieval model based on vision-language models. It supports document indexing and retrieval using late interaction over image embeddings. Current version is 0.3.15, with active development and periodic releases.
pip install colpali-engine Common errors
error ModuleNotFoundError: No module named 'colpali_engine' ↓
cause The package has not been installed or is installed under the wrong name (e.g., 'colpali' instead of 'colpali-engine').
fix
Install via 'pip install colpali-engine' and not 'pip install colpali'.
error AttributeError: module 'colpali_engine' has no attribute 'ColPaliModel' ↓
cause Importing from the wrong top-level module; ColPaliModel is in 'colpali_engine.models'.
fix
Use 'from colpali_engine.models import ColPaliModel'.
error ValueError: The model 'vidore/colpali-v1.2' does not have a processor class ↓
cause Using an outdated model name or the processor is not compatible; ensure both model and processor are from same repo.
fix
Use 'processor = ColPaliProcessor.from_pretrained(model_name)' where model_name is from Hugging Face.
error RuntimeError: CUDA out of memory ↓
cause The model requires more GPU memory than available; batch size too large or using full precision.
fix
Reduce batch size, use 'torch_dtype=torch.float16' or 'torch.bfloat16', or offload to CPU with 'device_map="auto"'.
Warnings
breaking Version 0.3.0 removed the 'ColPali' class and renamed the model class to 'ColPaliModel'. Code using 'from colpali_engine import ColPali' will break. ↓
fix Use 'from colpali_engine.models import ColPaliModel' instead.
deprecated The method 'ColPaliModel.forward()' has been deprecated in favor of directly calling the model object (__call__) or using 'model.generate()' for generation tasks. ↓
fix Replace 'model.forward(inputs)' with 'model(inputs)' or 'model.generate(inputs)' for text generation.
gotcha GPU vs CPU: ColPali models require significant GPU memory. Running on CPU may be extremely slow. Always check device availability. ↓
fix Use 'device_map="cuda"' if GPU available, or set 'device_map="auto"' for automatic mapping.
gotcha The processor expects images in PIL format or file paths. Passing raw numpy arrays may cause errors. ↓
fix Ensure images are loaded via PIL.Image.open() or processor.image_processor.convert_to_rgb() before processing.
Imports
- ColPaliModel wrong
import colpali_enginecorrectfrom colpali_engine.models import ColPaliModel - ColPaliProcessor wrong
from colpali_engine.processor import ColPaliProcessorcorrectfrom colpali_engine.models import ColPaliProcessor - ColPaliRetriever wrong
from colpali_engine import ColPaliRetrievercorrectfrom colpali_engine.retrieval import ColPaliRetriever
Quickstart
import torch
from colpali_engine.models import ColPaliModel, ColPaliProcessor
from colpali_engine.retrieval import ColPaliRetriever
model_name = "vidore/colpali-v1.2"
model = ColPaliModel.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="cuda" if torch.cuda.is_available() else "cpu")
processor = ColPaliProcessor.from_pretrained(model_name)
# Example documents (can be images or text)
docs = [
"A diagram of the ColPali architecture",
"Another document with text"
]
query = "ColPali architecture"
# Process and index documents
doc_embeddings = []
for doc in docs:
with torch.no_grad():
processed = processor.process_images([doc])
embeddings = model(**processed.to(model.device))
doc_embeddings.append(embeddings)
# Index with retriever
retriever = ColPaliRetriever(model, processor)
retriever.index(doc_embeddings)
# Search
results = retriever.search(query, k=2)
print(results)