Lance Python SDK
Pylance is the Python SDK for the Lance columnar data format, an open lakehouse format optimized for AI/ML workflows. It offers high-performance vector search, efficient random access, and built-in data versioning and lineage. The library leverages Apache Arrow for data interchange and its core bindings are implemented in Rust via PyO3 for performance. It seamlessly integrates with popular data science tools like Pandas, DuckDB, Polars, PyArrow, and Ray. The current version is 4.0.0, released on March 30, 2026, and the project has a regular release cadence with stable releases approximately every two weeks, although its development status is currently marked as 'Alpha'.
Warnings
- gotcha The library is currently in '3 - Alpha' development status, indicating that the API is not yet stable. Breaking changes may occur between minor versions.
- gotcha Lance is optimized to be cloud-native, and remote object storage is now the default paradigm. Expect different performance characteristics and potentially require different setup compared to traditional local file-system workflows.
- gotcha When integrating with DuckDB, specific functionalities may require DuckDB version 0.7 or newer to avoid potential segfaults or unexpected behavior.
- gotcha While preview releases offer the latest features and bug fixes and are tested similarly to stable releases, they are not guaranteed to be available for more than six months. For production environments, it is recommended to pin to stable releases.
Install
-
pip install pylance -
pip install --pre --extra-index-url https://pypi.fury.io/lance-format/ pylance
Imports
- lance
import lance
Quickstart
import lance
import pandas as pd
import numpy as np
# Create a simple Pandas DataFrame
df = pd.DataFrame({
"id": [1, 2, 3],
"vector": [[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]],
"text": ["apple", "banana", "cherry"]
})
# Define a URI for your Lance dataset (local path in this case)
lance_uri = "./my_lance_dataset.lance"
# Write the DataFrame to Lance format
dataset = lance.write_dataset(df, lance_uri, mode="overwrite")
print(f"Lance dataset created at: {lance_uri}")
# Open the dataset and read it back
read_dataset = lance.dataset(lance_uri)
read_df = read_dataset.to_pandas()
print("\nData read from Lance dataset:")
print(read_df)
# Clean up the created directory (optional)
import shutil
shutil.rmtree(lance_uri)
print(f"\nCleaned up {lance_uri}")