LanceDB

0.29.2 · active · verified Sat Feb 28

Embedded, serverless vector database built on the Lance columnar format (Apache Arrow-based). Runs in-process with no separate server required — data is stored on the local filesystem or object storage (S3, GCS, Azure). Supports vector similarity search, full-text search, SQL filtering, and automatic versioning. Also available as a managed cloud service (LanceDB Cloud). Python package is in 'Alpha' status on PyPI despite being production-used. Backed by pyarrow and pylance (the Lance Rust library, not Microsoft's Python language server).

Warnings

Install

Imports

Quickstart

No server needed. Data is stored on disk as Lance files. create_index() is required for ANN performance — without it, all searches are exact (brute-force). Versioning is automatic: every add/delete creates a new version.

import lancedb
import numpy as np
from lancedb.pydantic import LanceModel, Vector

# Connect (creates directory if not exists)
db = lancedb.connect("/tmp/my-lancedb")

# Define schema using LanceModel
class Item(LanceModel):
    text: str
    vector: Vector(128)  # fixed dimensions

# Create table
table = db.create_table("items", schema=Item, mode="overwrite")

# Add data
data = [
    Item(text="hello world", vector=np.random.rand(128).astype('float32'))
    for _ in range(100)
]
table.add(data)

# Vector search (returns pandas DataFrame by default)
query_vec = np.random.rand(128).astype('float32')
results = table.search(query_vec).limit(5).to_pandas()
print(results)

# Create ANN index (required for scale)
table.create_index(metric="cosine")  # IVF_PQ by default

view raw JSON →