lakeFS Python SDK
The lakeFS Python SDK provides a high-level, ergonomic interface for interacting with a lakeFS data lake. It simplifies operations like creating repositories, managing branches, committing data, and performing data versioning tasks. The current library version is 0.16.0, with releases occurring periodically to support new lakeFS server features and improve usability, typically decoupled from the rapid server release cycle.
Warnings
- gotcha The `lakefs` Python SDK (e.g., version 0.16.0) has its own versioning, which is independent of the lakeFS server version (e.g., 1.80.0). The SDK is generally backward compatible, designed to work with lakeFS server versions `v1.0.0` and later. However, always check the official documentation for specific compatibility notes, especially with major server upgrades or very old server instances.
- gotcha The `lakefs` package is the high-level Python SDK, offering an ergonomic, object-oriented interface. The `lakefs-client` package is the lower-level, auto-generated API client. For most use cases, it is recommended to use the `lakefs` package directly, as `lakefs-client` is typically meant for internal SDK use or highly specific, advanced scenarios.
- gotcha Proper configuration of the `LakeFSClient` is crucial. The client attempts to load credentials and server URI from environment variables (e.g., `LAKECTL_SERVER_URL`, `LAKECTL_ACCESS_KEY_ID`, `LAKECTL_SECRET_ACCESS_KEY`) or a `lakectl` configuration file by default. Explicitly passing credentials as arguments overrides these defaults.
Install
-
pip install lakefs
Imports
- LakeFSClient
from lakefs.client import LakeFSClient
- Repository
from lakefs.repository import Repository
- Branch
from lakefs.branch import Branch
- Commit
from lakefs.commit import Commit
Quickstart
import os
from lakefs.client import LakeFSClient
# It's recommended to set these environment variables:
# LAKECTL_SERVER_URL (e.g., "http://localhost:8000")
# LAKECTL_ACCESS_KEY_ID
# LAKECTL_SECRET_ACCESS_KEY
# Initialize the client. It will attempt to load credentials
# from environment variables or a lakectl config file by default.
# You can also pass them explicitly:
# client = LakeFSClient(
# uri="http://localhost:8000",
# access_key_id="YOUR_ACCESS_KEY_ID",
# secret_access_key="YOUR_SECRET_ACCESS_KEY"
# )
try:
client = LakeFSClient(
uri=os.environ.get('LAKECTL_SERVER_URL', 'http://localhost:8000'),
access_key_id=os.environ.get('LAKECTL_ACCESS_KEY_ID', ''),
secret_access_key=os.environ.get('LAKECTL_SECRET_ACCESS_KEY', '')
)
# Example: List repositories
print("Attempting to connect to lakeFS and list repositories...")
repos_iterator = client.repositories.list()
repos = list(repos_iterator) # Consume the iterator
if repos:
print("Found repositories:")
for repo in repos:
print(f"- {repo.id}")
else:
print("No repositories found. Create one first (e.g., via lakectl CLI or UI).")
except Exception as e:
print(f"Error connecting to lakeFS or listing repositories: {e}")
print("Please ensure lakeFS server is running and credentials are configured correctly (environment variables or lakectl config file).")