PyAirbyte

0.44.1 · active · verified Sun Apr 12

PyAirbyte is an open-source Python library that brings the power of Airbyte's extensive data connectors directly to Python and AI developers. It facilitates programmatic management of data movement between various API sources and destinations, enabling data pipelines to be built within Python environments without requiring a full Airbyte server or cloud account for many use cases. Currently at version 0.44.1, it is actively developed with frequent updates.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to connect to a data source (using `source-faker` for demonstration), verify its configuration, extract data, and load it into a Pandas DataFrame. It uses the recommended `import airbyte as ab` convention and shows basic steps for data ingestion.

import airbyte as ab
import os

# Configure a source (e.g., source-faker for demo purposes)
# For real connectors, replace 'source-faker' with the actual connector name
# and 'config' with your credentials/connection details.
# Use os.environ.get for sensitive information.
source = ab.get_source(
    "source-faker", 
    config={
        "count": 1000, # Number of records to generate
        "seed": 42    # Seed for reproducible data generation
    },
    install_if_missing=True # Automatically install the connector if not found
)

# Verify configuration and connection
check_result = source.check()
if check_result.status == "succeeded":
    print("Source connection successful!\n")
else:
    print(f"Source connection failed: {check_result.message}\n")
    exit()

# Select all available streams from the source
source.select_all_streams()

# Read data from the source into PyAirbyte's internal cache (DuckDB by default)
read_result = source.read()

# Access a specific stream and convert it to a Pandas DataFrame
# Replace 'users' with the actual stream name from your source
users_df = read_result["users"].to_pandas()

print("First 5 records from 'users' stream:")
print(users_df.head())

view raw JSON →