CocoIndex
raw JSON → 1.0.2 verified Fri May 01 auth: no python
CocoIndex is a Python library that automatically maintains search indexes derived from declarative transformations. Users define how to transform source data into an index, and CocoIndex incrementally updates the index when sources change, minimizing recomputation. As of version 1.0.2, it requires Python >= 3.11 and is under active development with monthly releases.
pip install cocoindex Common errors
error ModuleNotFoundError: No module named 'cocoindex' ↓
cause CocoIndex not installed or installed in a different environment.
fix
Run pip install cocoindex --upgrade in the correct Python environment (>=3.11).
error TypeError: 'NoneType' object is not iterable when calling cocoindex.build() ↓
cause The transform() method did not call self.create_index() or returned before creating the index.
fix
Ensure transform() method always creates an index by calling self.create_index() even if no data.
Warnings
gotcha CocoIndex does not automatically detect changes to source files; you need to trigger a rebuild explicitly for incremental updates. ↓
fix Use cocoindex.rebuild() or cocoindex.update() after changing source data.
gotcha Embedding generation is user's responsibility; CocoIndex only indexes the embeddings you provide. Common footgun: forgetting to generate embeddings before indexing. ↓
fix Provide a valid embedding vector (list of floats) for each document in the 'embedding' column.
deprecated The old import from cocoindex._core classes is deprecated and will be removed in future versions. ↓
fix Use public API imports: from cocoindex import DataFlow, build, rebuild, etc.
Imports
- cocoindex
import cocoindex - DataFlow wrong
from cocoindex._core import DataFlowcorrectfrom cocoindex import DataFlow
Quickstart
import cocoindex
import os
# Define a simple data flow that indexes documents into a vector index
class MyIndex(cocoindex.DataFlow):
def transform(self) -> None:
source = self.load_csv("data/documents.csv") # columns: id, text
source["embedding"] = source["text"].apply(lambda text: [0.0] * 384) # placeholder embedding
self.create_index(source, on="embedding", name="documents_index")
# Build the index (synchronous example)
cocoindex.build(MyIndex(), output_dir="./mydb")
print("Index built successfully.")