LanceDB
Embedded, serverless vector database built on the Lance columnar format (Apache Arrow-based). Runs in-process with no separate server required — data is stored on the local filesystem or object storage (S3, GCS, Azure). Supports vector similarity search, full-text search, SQL filtering, and automatic versioning. Also available as a managed cloud service (LanceDB Cloud). Python package is in 'Alpha' status on PyPI despite being production-used. Backed by pyarrow and pylance (the Lance Rust library, not Microsoft's Python language server).
Warnings
- breaking Illegal instruction (SIGILL) crash on import on older Intel CPUs (pre-AVX2). lancedb/pylance wheels are compiled with AVX2 SIMD instructions. Affects Ubuntu 20.04 on older hardware and some VMs where CPU features are masked.
- breaking Some lancedb releases have pinned a pre-release version of pylance as a hard dependency (e.g., lancedb==0.17.1 required pylance==0.21.0b5). This breaks pip/uv installs in strict environments that disallow pre-release packages.
- breaking Python >=3.10 required as of lancedb 0.25+. Earlier Python versions raise install or import errors.
- gotcha PyPI status is 'Development Status :: 3 - Alpha' despite being widely used in production. The API has had breaking changes between minor versions. Pin lancedb to a specific version in production.
- gotcha ANN index (create_index) must be created explicitly. Without it, all searches are O(n) brute-force regardless of dataset size. No warning is emitted — queries silently degrade at scale.
- gotcha Automatic versioning creates a new Lance snapshot on every write operation. On high-frequency write workloads this accumulates many small version files rapidly, increasing storage and compaction overhead.
- gotcha pylance (the LanceDB dependency) is a completely different package from pylance (Microsoft's Python language server for VS Code). pip install pylance without context installs Microsoft's package. lancedb's pylance is only installed as a transitive dependency.
Install
-
pip install lancedb -
pip install lancedb[embeddings] -
pip install lancedb[azure] -
pip install --pre --extra-index-url https://pypi.fury.io/lancedb/ lancedb
Imports
- lancedb
import lancedb
- LanceModel (schema)
from lancedb.pydantic import LanceModel, Vector
Quickstart
import lancedb
import numpy as np
from lancedb.pydantic import LanceModel, Vector
# Connect (creates directory if not exists)
db = lancedb.connect("/tmp/my-lancedb")
# Define schema using LanceModel
class Item(LanceModel):
text: str
vector: Vector(128) # fixed dimensions
# Create table
table = db.create_table("items", schema=Item, mode="overwrite")
# Add data
data = [
Item(text="hello world", vector=np.random.rand(128).astype('float32'))
for _ in range(100)
]
table.add(data)
# Vector search (returns pandas DataFrame by default)
query_vec = np.random.rand(128).astype('float32')
results = table.search(query_vec).limit(5).to_pandas()
print(results)
# Create ANN index (required for scale)
table.create_index(metric="cosine") # IVF_PQ by default