pyarrow-stubs
pyarrow-stubs provides type annotations for the Apache Arrow Python library (PyArrow). Its purpose is to enable static type checking tools like MyPy and Pyright to analyze code that uses PyArrow, improving code quality and catching potential errors at development time. The current version is 20.0.0.20251215, with releases typically aligning with PyArrow's development, and ongoing discussions about potentially integrating stubs directly into the main Apache Arrow project.
Warnings
- gotcha Type checkers like MyPy and Pyright will report 'Stub file not found for "pyarrow"' or 'Skipping analyzing "pyarrow": module is installed, but missing library stubs or py.typed marker' if `pyarrow-stubs` is not installed alongside `pyarrow`. Ensure `pyarrow-stubs` is present in your environment for proper static analysis.
- gotcha Due to PyArrow's underlying C++/Cython implementation and dynamic nature, some parts of its API may have incomplete or less precise type hints within `pyarrow-stubs`. You might encounter situations where explicit type casting (`typing.cast`) or `typing.Any` is necessary, especially for complex or dynamically generated types. The PyArrow community is actively working on improving type coverage, including discussions about integrating stubs directly into the main project.
- breaking A security vulnerability (CVE-2023-47248) in `pyarrow` related to `pyarrow.PyExtensionType` led to the `pyarrow-hotfix` package. If your code uses `pyarrow.PyExtensionType` (e.g., for custom extension types in Parquet files) and `pyarrow-hotfix` is active, your workloads might fail. The recommended fix is to migrate to `pyarrow.ExtensionType`.
Install
-
pip install pyarrow-stubs
Imports
- Table
import pyarrow as pa # Usage: pa.Table
- Array
import pyarrow as pa # Usage: pa.Array
Quickstart
import pyarrow as pa
from typing import List
def create_arrow_table(names: List[str], ages: List[int]) -> pa.Table:
"""Creates a PyArrow Table from names and ages."""
if not (len(names) == len(ages)):
raise ValueError("Lengths of names and ages must match.")
# Create PyArrow Arrays
name_array = pa.array(names, type=pa.string())
age_array = pa.array(ages, type=pa.int64())
# Create a PyArrow Table
table = pa.table({'name': name_array, 'age': age_array})
return table
if __name__ == "__main__":
my_names = ["Alice", "Bob", "Charlie"]
my_ages = [30, 24, 35]
# This call would be type-checked by tools like MyPy/Pyright
person_table: pa.Table = create_arrow_table(my_names, my_ages)
print("Created PyArrow Table:")
print(person_table)
print(f"\nFirst person's name: {person_table.column('name')[0].as_py()}")