Lance Python SDK

4.0.0 · active · verified Fri Apr 10

Pylance is the Python SDK for the Lance columnar data format, an open lakehouse format optimized for AI/ML workflows. It offers high-performance vector search, efficient random access, and built-in data versioning and lineage. The library leverages Apache Arrow for data interchange and its core bindings are implemented in Rust via PyO3 for performance. It seamlessly integrates with popular data science tools like Pandas, DuckDB, Polars, PyArrow, and Ray. The current version is 4.0.0, released on March 30, 2026, and the project has a regular release cadence with stable releases approximately every two weeks, although its development status is currently marked as 'Alpha'.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to create a Lance dataset from a Pandas DataFrame, write it to disk, and then read it back. It also includes cleanup of the created files.

import lance
import pandas as pd
import numpy as np

# Create a simple Pandas DataFrame
df = pd.DataFrame({
    "id": [1, 2, 3],
    "vector": [[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]],
    "text": ["apple", "banana", "cherry"]
})

# Define a URI for your Lance dataset (local path in this case)
lance_uri = "./my_lance_dataset.lance"

# Write the DataFrame to Lance format
dataset = lance.write_dataset(df, lance_uri, mode="overwrite")

print(f"Lance dataset created at: {lance_uri}")

# Open the dataset and read it back
read_dataset = lance.dataset(lance_uri)
read_df = read_dataset.to_pandas()

print("\nData read from Lance dataset:")
print(read_df)

# Clean up the created directory (optional)
import shutil
shutil.rmtree(lance_uri)
print(f"\nCleaned up {lance_uri}")

view raw JSON →