{"id":4335,"library":"airbyte","title":"PyAirbyte","description":"PyAirbyte is an open-source Python library that brings the power of Airbyte's extensive data connectors directly to Python and AI developers. It facilitates programmatic management of data movement between various API sources and destinations, enabling data pipelines to be built within Python environments without requiring a full Airbyte server or cloud account for many use cases. Currently at version 0.44.1, it is actively developed with frequent updates.","status":"active","version":"0.44.1","language":"en","source_language":"en","source_url":"https://github.com/airbytehq/PyAirbyte","tags":["data integration","ETL","ELT","connectors","data pipeline","AI","LLM","data engineering"],"install":[{"cmd":"pip install airbyte","lang":"bash","label":"Install PyAirbyte"}],"dependencies":[{"reason":"Used by default for faster Python connector installation since v0.29.0; a fallback to pip is available via environment variable.","package":"uv","optional":true},{"reason":"Required for running Java-based destination connectors. Python-native SQL caches are often recommended as an alternative.","package":"docker","optional":true},{"reason":"Commonly used for data manipulation after extracting data into DataFrames.","package":"pandas","optional":true}],"imports":[{"note":"The 'ab' alias is the recommended and most common convention in PyAirbyte documentation and examples for concise and readable code.","symbol":"airbyte","correct":"import airbyte as ab"}],"quickstart":{"code":"import airbyte as ab\nimport os\n\n# Configure a source (e.g., source-faker for demo purposes)\n# For real connectors, replace 'source-faker' with the actual connector name\n# and 'config' with your credentials/connection details.\n# Use os.environ.get for sensitive information.\nsource = ab.get_source(\n    \"source-faker\", \n    config={\n        \"count\": 1000, # Number of records to generate\n        \"seed\": 42    # Seed for reproducible data generation\n    },\n    install_if_missing=True # Automatically install the connector if not found\n)\n\n# Verify configuration and connection\ncheck_result = source.check()\nif check_result.status == \"succeeded\":\n    print(\"Source connection successful!\\n\")\nelse:\n    print(f\"Source connection failed: {check_result.message}\\n\")\n    exit()\n\n# Select all available streams from the source\nsource.select_all_streams()\n\n# Read data from the source into PyAirbyte's internal cache (DuckDB by default)\nread_result = source.read()\n\n# Access a specific stream and convert it to a Pandas DataFrame\n# Replace 'users' with the actual stream name from your source\nusers_df = read_result[\"users\"].to_pandas()\n\nprint(\"First 5 records from 'users' stream:\")\nprint(users_df.head())","lang":"python","description":"This quickstart demonstrates how to connect to a data source (using `source-faker` for demonstration), verify its configuration, extract data, and load it into a Pandas DataFrame. It uses the recommended `import airbyte as ab` convention and shows basic steps for data ingestion."},"warnings":[{"fix":"Monitor official PyAirbyte documentation and GitHub releases for updates on the MCP Server's stability. For production, rely on stable APIs.","message":"The PyAirbyte MCP (Model Context Protocol) Server is currently experimental. Its API and features may change significantly or be entirely refactored without notice between minor versions of PyAirbyte. Avoid using it in production environments where stability is critical.","severity":"breaking","affected_versions":"0.44.1 (and potentially future minor versions)"},{"fix":"When initializing a source or destination, use `ab.get_source(..., use_python=\"3.11\")` or `ab.get_source(..., docker_image=True)` to control the connector's execution environment.","message":"While PyAirbyte itself supports Python 3.10+, specific Airbyte connectors might have stricter Python version requirements or dependencies. If you encounter issues, consider explicitly setting the Python version for the connector via the `use_python` argument in `ab.get_source()` or `ab.get_destination()`. Alternatively, using `docker_image=True` can provide greater stability by leveraging Docker.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Install Docker for Java-based destinations. For SQL databases, use PyAirbyte's built-in SQL caching mechanisms instead of destination connectors for a more Python-native experience: `cache = ab.get_destination_cache('postgres', config=...)`.","message":"Java-based Airbyte destination connectors (which typically run as Docker containers) require Docker to be installed and running on your system. For greater portability and Python-native execution, consider utilizing SQL-based caches (e.g., DuckDB, Postgres, Snowflake) directly within PyAirbyte, if your destination is a SQL database.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Set the environment variable `AIRBYTE_NO_UV=true` before running your PyAirbyte code to revert to `pip`-based connector installation: `os.environ['AIRBYTE_NO_UV'] = 'true'`.","message":"Starting with PyAirbyte version 0.29.0, the library defaults to using `uv` instead of `pip` for installing Python-based connectors, offering significant speed improvements. If `uv` causes unexpected issues or conflicts in your environment, you can force PyAirbyte to fall back to `pip` for connector installations.","severity":"gotcha","affected_versions":">=0.29.0"}],"env_vars":null,"last_verified":"2026-04-12T00:00:00.000Z","next_check":"2026-07-11T00:00:00.000Z"}