PyAirbyte
PyAirbyte is an open-source Python library that brings the power of Airbyte's extensive data connectors directly to Python and AI developers. It facilitates programmatic management of data movement between various API sources and destinations, enabling data pipelines to be built within Python environments without requiring a full Airbyte server or cloud account for many use cases. Currently at version 0.44.1, it is actively developed with frequent updates.
Warnings
- breaking The PyAirbyte MCP (Model Context Protocol) Server is currently experimental. Its API and features may change significantly or be entirely refactored without notice between minor versions of PyAirbyte. Avoid using it in production environments where stability is critical.
- gotcha While PyAirbyte itself supports Python 3.10+, specific Airbyte connectors might have stricter Python version requirements or dependencies. If you encounter issues, consider explicitly setting the Python version for the connector via the `use_python` argument in `ab.get_source()` or `ab.get_destination()`. Alternatively, using `docker_image=True` can provide greater stability by leveraging Docker.
- gotcha Java-based Airbyte destination connectors (which typically run as Docker containers) require Docker to be installed and running on your system. For greater portability and Python-native execution, consider utilizing SQL-based caches (e.g., DuckDB, Postgres, Snowflake) directly within PyAirbyte, if your destination is a SQL database.
- gotcha Starting with PyAirbyte version 0.29.0, the library defaults to using `uv` instead of `pip` for installing Python-based connectors, offering significant speed improvements. If `uv` causes unexpected issues or conflicts in your environment, you can force PyAirbyte to fall back to `pip` for connector installations.
Install
-
pip install airbyte
Imports
- airbyte
import airbyte as ab
Quickstart
import airbyte as ab
import os
# Configure a source (e.g., source-faker for demo purposes)
# For real connectors, replace 'source-faker' with the actual connector name
# and 'config' with your credentials/connection details.
# Use os.environ.get for sensitive information.
source = ab.get_source(
"source-faker",
config={
"count": 1000, # Number of records to generate
"seed": 42 # Seed for reproducible data generation
},
install_if_missing=True # Automatically install the connector if not found
)
# Verify configuration and connection
check_result = source.check()
if check_result.status == "succeeded":
print("Source connection successful!\n")
else:
print(f"Source connection failed: {check_result.message}\n")
exit()
# Select all available streams from the source
source.select_all_streams()
# Read data from the source into PyAirbyte's internal cache (DuckDB by default)
read_result = source.read()
# Access a specific stream and convert it to a Pandas DataFrame
# Replace 'users' with the actual stream name from your source
users_df = read_result["users"].to_pandas()
print("First 5 records from 'users' stream:")
print(users_df.head())