pydantic-spark

1.0.1 verified Mon Apr 27 auth: no python

Converts Pydantic models to PySpark schemas. Current version 1.0.1 supports Pydantic v2. Release cadence is irregular. Designed for data engineering pipelines where Pydantic models define data contracts and Spark schemas must be inferred.

pip install pydantic-spark

Common errors

error ImportError: cannot import name 'to_spark_schema' from 'pydantic_spark' ↓

cause Installed version <0.3.0 does not have the API; or typo in import path.

fix

Upgrade to latest version: pip install --upgrade pydantic-spark. Use correct import: from pydantic_spark import to_spark_schema

error pyspark.sql.utils.AnalysisException: u'Unable to infer schema for type. It must be specified manually.;' ↓

cause The generated schema is incomplete or wrong for complex Pydantic models (e.g., Union types).

fix

Simplify the model to avoid Union/Optional; or provide explicit schema via pyspark's StructType.

error AttributeError: module 'pydantic_spark' has no attribute 'to_spark_schema' ↓

cause The function was renamed or moved in version 1.0.0.

fix

Use from pydantic_spark import to_spark_schema directly. If using v0.x, use from pydantic_spark.converter import to_spark_schema.

Warnings

breaking Version 1.0.0 dropped support for Pydantic v1. If upgrading from v0.3.0 or earlier, you must migrate your models to Pydantic v2. ↓

fix Update Pydantic to v2 and follow their migration guide (https://docs.pydantic.dev/latest/migration/)

gotcha Complex nested types (e.g., models with Union, Optional, or recursive references) may produce unexpected Spark types. Manual schema adjustments might be needed. ↓

fix Inspect the generated schema and override using pydantic Field(..., schema_extra={...}) or custom serialization.

deprecated The 'coerce' feature (CoerceType) from v0.3.0 is deprecated in v1.0.0+. Use Pydantic's built-in validators instead. ↓

fix Remove usage of CoerceType and replace with @field_validator or @model_validator in your Pydantic model.

Imports

to_spark_schema

wrong

from pydantic_spark.core import to_spark_schema

correct

from pydantic_spark import to_spark_schema

Correct import path is from pydantic_spark directly

to_pandas_schema
```
from pydantic_spark import to_pandas_schema
```
Secondary utility for generating pandas DataFrames

Quickstart

Basic usage: define a Pydantic model and convert to Spark schema.

from pyspark.sql import SparkSession
from pydantic import BaseModel
from pydantic_spark import to_spark_schema

class MyModel(BaseModel):
    name: str
    age: int

spark = SparkSession.builder.getOrCreate()
schema = to_spark_schema(MyModel)
df = spark.createDataFrame([], schema)
print(df.schema)
spark.stop()