{"id":8475,"library":"pydantic-to-pyarrow","title":"Pydantic to PyArrow Schema Conversion","description":"pydantic-to-pyarrow is a Python library (current version 0.1.6) designed to facilitate the conversion of Pydantic models into Apache PyArrow schemas. It streamlines data processing pipelines by allowing validation with Pydantic and subsequent conversion to a columnar format for efficient processing with PyArrow, Pandas, or Polars, and storage in formats like Parquet. The library is actively maintained with regular feature releases.","status":"active","version":"0.1.6","language":"en","source_language":"en","source_url":"https://github.com/simw/pydantic-to-pyarrow","tags":["pydantic","pyarrow","schema conversion","data validation","apache arrow"],"install":[{"cmd":"pip install pydantic-to-pyarrow","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Core functionality relies on Pydantic models for schema definition and validation.","package":"pydantic"},{"reason":"Core functionality involves converting to PyArrow schemas and types.","package":"pyarrow"},{"reason":"Used for inspecting Python type hints, a runtime dependency for schema reflection.","package":"typing-inspect"}],"imports":[{"symbol":"get_pyarrow_schema","correct":"from pydantic_to_pyarrow import get_pyarrow_schema"}],"quickstart":{"code":"import pyarrow as pa\nfrom pydantic import BaseModel\nfrom typing import List, Optional\nfrom datetime import datetime\nfrom uuid import UUID\n\nfrom pydantic_to_pyarrow import get_pyarrow_schema\n\nclass Address(BaseModel):\n    street: str\n    zip_code: int\n\nclass Person(BaseModel):\n    name: str\n    age: int\n    height_cm: Optional[float]\n    is_active: bool\n    created_at: datetime\n    uuid_id: UUID\n    tags: List[str] = []\n    address: Address\n\n# Convert the Pydantic model to a PyArrow Schema\narrow_schema = get_pyarrow_schema(Person)\n\nprint(arrow_schema)\n# Expected output (order of fields may vary slightly depending on Pydantic version):\n# name: string\n# age: int64\n# height_cm: double\n# is_active: bool\n# created_at: timestamp[ns]\n# uuid_id: fixed_size_binary[16]\n# tags: list<item: string>\n#   child 0, item: string\n# address: struct<street: string, zip_code: int64>\n#   child 0, street: string\n#   child 1, zip_code: int64","lang":"python","description":"This quickstart defines a nested Pydantic model (`Person` containing `Address`) with various field types including optional fields, lists, datetime, and UUID. It then uses `get_pyarrow_schema` to generate the corresponding PyArrow schema, demonstrating the library's primary functionality. The output shows how Pydantic types map to PyArrow types."},"warnings":[{"fix":"Ensure PyArrow version is 15.0 or higher, or explicitly pin NumPy to 1.x (e.g., `numpy<2`).","message":"PyArrow versions less than 15.0 are incompatible with NumPy 2.x, which can lead to runtime errors (e.g., 'A module that was compiled using NumPy 1.x cannot be run in Numpy 2.x').","severity":"gotcha","affected_versions":"<= 0.1.6"},{"fix":"Manually verify integer ranges if concerned about overflows, or explicitly define smaller PyArrow integer types if appropriate.","message":"Python's `int` type is unbounded, but PyArrow's `pa.int64()` has a fixed maximum. Large Python integers may overflow when converted, leading to data loss or unexpected values.","severity":"gotcha","affected_versions":"All"},{"fix":"Add a Pydantic serializer to your `UUID` field, e.g., `uuid_id: UUID = Field(json_schema_extra={'pyarrow_serializer': lambda uuid: uuid.bytes})`.","message":"When creating PyArrow tables from Pydantic models that include `UUID` fields, especially with PyArrow 19.0+, `pa.Table.from_pylist` expects bytes, not `UUID` objects directly. This requires adding a serializer to your Pydantic model to convert UUIDs to bytes.","severity":"gotcha","affected_versions":"PyArrow >= 19.0, pydantic-to-pyarrow <= 0.1.6"},{"fix":"To allow conversion with timezone loss, pass `allow_losing_tz=True` to `get_pyarrow_schema`. Example: `get_pyarrow_schema(MyModel, allow_losing_tz=True)`.","message":"By default, converting timezone-aware Python datetimes will raise an exception to prevent loss of timezone information. The generated PyArrow schema will use `timestamp[ns]` without timezone.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Upgrade `pyarrow` to version 15.0 or higher: `pip install --upgrade pyarrow`. Alternatively, if an upgrade is not possible, downgrade `numpy` to a 1.x version: `pip install \"numpy<2\"`.","cause":"Incompatibility between an older PyArrow version (pre-15.0) and a newer NumPy version (2.x).","error":"A module that was compiled using NumPy 1.x cannot be run in Numpy 2.x. This file was compiled with numpy 1.x and is trying to run with numpy 2.x."},{"fix":"Check the PyArrow documentation for supported Python versions. Consider using a slightly older, supported Python version, or wait for PyArrow to release wheels for your specific Python version. Sometimes, installing build dependencies (e.g., `pip install cython setuptools wheel`) can help, but a missing wheel for the specific Python version is usually the root cause.","cause":"Often occurs when `pyarrow` is installed on a Python version for which pre-built wheels are not yet available (e.g., a very new Python release).","error":"ERROR: Failed building wheel for pyarrow\nERROR: Could not build wheels for pyarrow, which is required to install pyproject.toml-based projects"},{"fix":"Review the `pydantic-to-pyarrow` documentation or source for supported type conversions. If your type is not supported, consider transforming it to a compatible type within your Pydantic model (e.g., converting a custom object to a `str` or `dict`) or contributing support to the library. For Enums, ensure `pydantic-to-pyarrow` version is at least 0.1.2.","cause":"The Pydantic model contains a Python type (e.g., a custom type, or a standard library type not yet explicitly supported) that `pydantic-to-pyarrow` does not have a defined conversion for to a PyArrow type.","error":"TypeError: Converting Pydantic type to Arrow Type: unsupported type <some_type>"}]}