arrow-odbc library
arrow-odbc is a Python library that enables efficient reading of data from any ODBC data source directly into Apache Arrow record batches. Built with Rust, it provides a high-performance bridge between relational databases accessible via ODBC and Python's data analysis ecosystem. As of version 10.1.0, it offers robust capabilities for data ingestion into Arrow, supporting various data types and large datasets. It generally follows a regular release cadence, with major versions often introducing significant features or breaking changes.
Common errors
-
ModuleNotFoundError: No module named 'arrow_odbc'
cause The `arrow-odbc` library is not installed in the active Python environment.fixRun `pip install arrow-odbc` to install the library. -
pyodbc.Error: ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'ODBC Driver 17 for SQL Server' : file not found (0) (SQLDriverConnect)")cause The specified ODBC driver (e.g., 'ODBC Driver 17 for SQL Server') is not installed or not correctly configured on the operating system.fixInstall the appropriate ODBC driver for your database and OS. For example, on Ubuntu, you might need `sudo apt-get install unixodbc-dev msodbcsql17` for SQL Server. -
TypeError: read_arrow_tables() got an unexpected keyword argument '_connection_string'
cause This error can occur if you are using an `arrow-odbc` version where the connection string parameter was named `connection_string` (v10.x+) but your code is using `_connection_string`, or vice-versa from an older version (v9.x).fixEnsure your code uses `connection_string` for `arrow-odbc` versions 10.0.0 and above. If you are intentionally using an older version (9.x), use `_connection_string`. -
arrow_odbc.ArrowOdbcError: ODBC error: [State: 22003, Native: 0, Message: [Microsoft][ODBC Driver 17 for SQL Server]Numeric value out of range]
cause A data type conversion issue occurred, where a numeric value from the database exceeds the range or precision of the target Apache Arrow type. This can happen with very large integers or high-precision decimals.fixAdjust your SQL query to cast the problematic column(s) to a more permissive type (e.g., `VARCHAR` or a larger integer type if the database supports it) or handle the conversion explicitly in your Python code after retrieval.
Warnings
- breaking The primary data retrieval function `read_all_tables` was renamed to `read_arrow_tables` in version 10.0.0. Older code using `read_all_tables` will fail after upgrading.
- gotcha Installation of the appropriate ODBC driver for your specific database and operating system is a prerequisite and is handled outside of Python. `arrow-odbc` relies on a correctly configured ODBC environment.
- gotcha ODBC connection string syntax is highly specific to the ODBC driver and database being used. Incorrectly formatted connection strings are a common source of connection errors.
- gotcha When querying large datasets, the resulting Apache Arrow Table can consume significant amounts of memory, potentially leading to out-of-memory errors if not managed carefully.
Install
-
pip install arrow-odbc
Imports
- read_arrow_tables
from arrow_odbc import read_arrow_tables
Quickstart
import os
from arrow_odbc import read_arrow_tables
import pyarrow.parquet as pq
# NOTE: You must have an ODBC driver installed on your system
# for the target database (e.g., SQL Server, PostgreSQL, MySQL).
# The connection string below is an example. Adjust it for your setup.
# Example connection strings:
# SQL Server (Windows/Linux): DRIVER={ODBC Driver 17 for SQL Server};SERVER=localhost;DATABASE=testdb;UID=user;PWD=password
# PostgreSQL (Linux): DRIVER={PostgreSQL Unicode};SERVER=localhost;DATABASE=testdb;UID=user;PASSWORD=password
connection_string = os.environ.get(
'ARROW_ODBC_CONNECTION_STRING',
'DRIVER={ODBC Driver 17 for SQL Server};SERVER=localhost;DATABASE=testdb;UID=user;PWD=password'
)
# Example query. Adjust 'YourTable' and syntax for your database.
# For SQL Server: "SELECT TOP 100 * FROM YourTable"
# For PostgreSQL: "SELECT * FROM YourTable LIMIT 100"
query = "SELECT TOP 100 * FROM YourTable"
try:
# Read data into a PyArrow Table
arrow_table = read_arrow_tables(
connection_string=connection_string,
query=query
)
print(f"Successfully read {arrow_table.num_rows} rows.")
print(f"Schema:\n{arrow_table.schema}")
if arrow_table.num_rows > 0:
print(f"First 5 rows:\n{arrow_table.slice(0, min(5, arrow_table.num_rows)).to_pylist()}")
# Example: Save to Parquet
# pq.write_table(arrow_table, "output.parquet")
except Exception as e:
print(f"An error occurred: {e}")
print("Please ensure your ODBC driver is installed and the connection string/query are correct.")