Pydantic to PyArrow Schema Conversion

0.1.6 · active · verified Thu Apr 16

pydantic-to-pyarrow is a Python library (current version 0.1.6) designed to facilitate the conversion of Pydantic models into Apache PyArrow schemas. It streamlines data processing pipelines by allowing validation with Pydantic and subsequent conversion to a columnar format for efficient processing with PyArrow, Pandas, or Polars, and storage in formats like Parquet. The library is actively maintained with regular feature releases.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart defines a nested Pydantic model (`Person` containing `Address`) with various field types including optional fields, lists, datetime, and UUID. It then uses `get_pyarrow_schema` to generate the corresponding PyArrow schema, demonstrating the library's primary functionality. The output shows how Pydantic types map to PyArrow types.

import pyarrow as pa
from pydantic import BaseModel
from typing import List, Optional
from datetime import datetime
from uuid import UUID

from pydantic_to_pyarrow import get_pyarrow_schema

class Address(BaseModel):
    street: str
    zip_code: int

class Person(BaseModel):
    name: str
    age: int
    height_cm: Optional[float]
    is_active: bool
    created_at: datetime
    uuid_id: UUID
    tags: List[str] = []
    address: Address

# Convert the Pydantic model to a PyArrow Schema
arrow_schema = get_pyarrow_schema(Person)

print(arrow_schema)
# Expected output (order of fields may vary slightly depending on Pydantic version):
# name: string
# age: int64
# height_cm: double
# is_active: bool
# created_at: timestamp[ns]
# uuid_id: fixed_size_binary[16]
# tags: list<item: string>
#   child 0, item: string
# address: struct<street: string, zip_code: int64>
#   child 0, street: string
#   child 1, zip_code: int64

view raw JSON →