CocoIndex

raw JSON →
1.0.2 verified Fri May 01 auth: no python

CocoIndex is a Python library that automatically maintains search indexes derived from declarative transformations. Users define how to transform source data into an index, and CocoIndex incrementally updates the index when sources change, minimizing recomputation. As of version 1.0.2, it requires Python >= 3.11 and is under active development with monthly releases.

pip install cocoindex
error ModuleNotFoundError: No module named 'cocoindex'
cause CocoIndex not installed or installed in a different environment.
fix
Run pip install cocoindex --upgrade in the correct Python environment (>=3.11).
error TypeError: 'NoneType' object is not iterable when calling cocoindex.build()
cause The transform() method did not call self.create_index() or returned before creating the index.
fix
Ensure transform() method always creates an index by calling self.create_index() even if no data.
gotcha CocoIndex does not automatically detect changes to source files; you need to trigger a rebuild explicitly for incremental updates.
fix Use cocoindex.rebuild() or cocoindex.update() after changing source data.
gotcha Embedding generation is user's responsibility; CocoIndex only indexes the embeddings you provide. Common footgun: forgetting to generate embeddings before indexing.
fix Provide a valid embedding vector (list of floats) for each document in the 'embedding' column.
deprecated The old import from cocoindex._core classes is deprecated and will be removed in future versions.
fix Use public API imports: from cocoindex import DataFlow, build, rebuild, etc.

Minimal example: define a DataFlow subclass, transform data, and build a persistent index.

import cocoindex
import os

# Define a simple data flow that indexes documents into a vector index
class MyIndex(cocoindex.DataFlow):
    def transform(self) -> None:
        source = self.load_csv("data/documents.csv")  # columns: id, text
        source["embedding"] = source["text"].apply(lambda text: [0.0] * 384)  # placeholder embedding
        self.create_index(source, on="embedding", name="documents_index")

# Build the index (synchronous example)
cocoindex.build(MyIndex(), output_dir="./mydb")
print("Index built successfully.")