arrow-odbc library

10.1.0 · active · verified Fri Apr 17

arrow-odbc is a Python library that enables efficient reading of data from any ODBC data source directly into Apache Arrow record batches. Built with Rust, it provides a high-performance bridge between relational databases accessible via ODBC and Python's data analysis ecosystem. As of version 10.1.0, it offers robust capabilities for data ingestion into Arrow, supporting various data types and large datasets. It generally follows a regular release cadence, with major versions often introducing significant features or breaking changes.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to connect to an ODBC data source and retrieve data as an Apache Arrow Table using `arrow-odbc`. It highlights the `read_arrow_tables` function, which is the primary entry point for data retrieval. Users must replace the example connection string and query with their specific database details and ensure the appropriate ODBC driver is installed on their system.

import os
from arrow_odbc import read_arrow_tables
import pyarrow.parquet as pq

# NOTE: You must have an ODBC driver installed on your system
# for the target database (e.g., SQL Server, PostgreSQL, MySQL).
# The connection string below is an example. Adjust it for your setup.

# Example connection strings:
# SQL Server (Windows/Linux): DRIVER={ODBC Driver 17 for SQL Server};SERVER=localhost;DATABASE=testdb;UID=user;PWD=password
# PostgreSQL (Linux): DRIVER={PostgreSQL Unicode};SERVER=localhost;DATABASE=testdb;UID=user;PASSWORD=password

connection_string = os.environ.get(
    'ARROW_ODBC_CONNECTION_STRING', 
    'DRIVER={ODBC Driver 17 for SQL Server};SERVER=localhost;DATABASE=testdb;UID=user;PWD=password'
)

# Example query. Adjust 'YourTable' and syntax for your database.
# For SQL Server: "SELECT TOP 100 * FROM YourTable"
# For PostgreSQL: "SELECT * FROM YourTable LIMIT 100"
query = "SELECT TOP 100 * FROM YourTable"

try:
    # Read data into a PyArrow Table
    arrow_table = read_arrow_tables(
        connection_string=connection_string,
        query=query
    )

    print(f"Successfully read {arrow_table.num_rows} rows.")
    print(f"Schema:\n{arrow_table.schema}")
    if arrow_table.num_rows > 0:
        print(f"First 5 rows:\n{arrow_table.slice(0, min(5, arrow_table.num_rows)).to_pylist()}")

    # Example: Save to Parquet
    # pq.write_table(arrow_table, "output.parquet")

except Exception as e:
    print(f"An error occurred: {e}")
    print("Please ensure your ODBC driver is installed and the connection string/query are correct.")

view raw JSON →