Lance Namespace
Lance Namespace is an open specification for describing access and operations against a collection of tables in a multimodal lakehouse, providing a unified model for metadata services and compute engines. The `lance-namespace` Python package offers the core interface and a plugin registry, enabling seamless integration with existing data lakehouse infrastructure, Apache Spark, Ray, and LanceDB for AI and analytics workloads. It is currently at version 0.6.1 and sees active development with frequent releases.
Warnings
- breaking The underlying `lance` core library, which `lance-namespace` integrates with, has undergone API cleanups related to namespace handling (e.g., in `lance` 0.6.x). While `lance-namespace` aims for stability, deep integrations might be affected by these changes, especially when upgrading `lance` itself.
- gotcha The API is actively evolving with significant new features such as `BatchCommitTables`, `table version operations`, `async client support`, `partition specs`, and `vended credentials` being introduced in recent minor releases (0.4.x - 0.6.x). This rapid evolution can lead to code written for older versions encountering unexpected behavior or requiring updates to leverage new capabilities.
- gotcha While `lance-namespace` provides the interface, full functionality, especially for native implementations (like directory-based or cloud storage namespaces), often requires the `lance` package (the core Lance data format library) to be installed alongside it. Missing `lance` can lead to errors or limited functionality during connection or table operations.
- deprecated The Directory Namespace has two major spec versions: V1 (Directory Listing) and V2 (Manifest). V1 is a simple 1-level namespace suitable for quick starts, while V2 is more advanced, backed by a manifest table, supporting nested namespaces and better performance at scale. Relying on V1 for complex or large-scale deployments may lead to limitations.
Install
-
pip install lance-namespace
Imports
- connect
import lance_namespace ns = lance_namespace.connect(...)
Quickstart
import lance_namespace
import os
# Configure a root path for the directory-based namespace
# For persistent storage, change this to a desired directory or cloud path (e.g., s3://my-bucket/data)
root_path = os.environ.get('LANCE_ROOT_PATH', '/tmp/lance_namespace_data')
# Connect to a directory-based namespace
# This creates a namespace instance that manages tables within the specified directory
try:
ns = lance_namespace.connect("dir", {"root": root_path})
print(f"Successfully connected to Lance Namespace at: {root_path}")
# Example: List tables (initially empty or existing tables)
tables = ns.list_tables()
print(f"Tables in namespace: {tables}")
# In a real scenario, you would then use `ns` to declare, create, or manage Lance tables.
# For instance, to declare a new table (requires `lance` package for actual data operations):
# from lance.client import Client
# client = Client(f"file://{root_path}")
# table_name = "my_example_table"
# try:
# table_uri = ns.declare_table(table_name, "table_uri_placeholder")
# print(f"Declared table '{table_name}' with URI: {table_uri}")
# except Exception as e:
# print(f"Could not declare table '{table_name}': {e}")
except Exception as e:
print(f"Error connecting to Lance Namespace: {e}")
print("Ensure the 'lance' package is installed for native implementations and required dependencies are met.")
# Clean up the temporary directory if it was created
# import shutil
# if root_path.startswith('/tmp/') and os.path.exists(root_path):
# shutil.rmtree(root_path)
# print(f"Cleaned up temporary directory: {root_path}")