pydantic-spark
raw JSON → 1.0.1 verified Mon Apr 27 auth: no python
Converts Pydantic models to PySpark schemas. Current version 1.0.1 supports Pydantic v2. Release cadence is irregular. Designed for data engineering pipelines where Pydantic models define data contracts and Spark schemas must be inferred.
pip install pydantic-spark Common errors
error ImportError: cannot import name 'to_spark_schema' from 'pydantic_spark' ↓
cause Installed version <0.3.0 does not have the API; or typo in import path.
fix
Upgrade to latest version: pip install --upgrade pydantic-spark. Use correct import: from pydantic_spark import to_spark_schema
error pyspark.sql.utils.AnalysisException: u'Unable to infer schema for type. It must be specified manually.;' ↓
cause The generated schema is incomplete or wrong for complex Pydantic models (e.g., Union types).
fix
Simplify the model to avoid Union/Optional; or provide explicit schema via pyspark's StructType.
error AttributeError: module 'pydantic_spark' has no attribute 'to_spark_schema' ↓
cause The function was renamed or moved in version 1.0.0.
fix
Use from pydantic_spark import to_spark_schema directly. If using v0.x, use from pydantic_spark.converter import to_spark_schema.
Warnings
breaking Version 1.0.0 dropped support for Pydantic v1. If upgrading from v0.3.0 or earlier, you must migrate your models to Pydantic v2. ↓
fix Update Pydantic to v2 and follow their migration guide (https://docs.pydantic.dev/latest/migration/)
gotcha Complex nested types (e.g., models with Union, Optional, or recursive references) may produce unexpected Spark types. Manual schema adjustments might be needed. ↓
fix Inspect the generated schema and override using pydantic Field(..., schema_extra={...}) or custom serialization.
deprecated The 'coerce' feature (CoerceType) from v0.3.0 is deprecated in v1.0.0+. Use Pydantic's built-in validators instead. ↓
fix Remove usage of CoerceType and replace with @field_validator or @model_validator in your Pydantic model.
Imports
- to_spark_schema wrong
from pydantic_spark.core import to_spark_schemacorrectfrom pydantic_spark import to_spark_schema - to_pandas_schema
from pydantic_spark import to_pandas_schema
Quickstart
from pyspark.sql import SparkSession
from pydantic import BaseModel
from pydantic_spark import to_spark_schema
class MyModel(BaseModel):
name: str
age: int
spark = SparkSession.builder.getOrCreate()
schema = to_spark_schema(MyModel)
df = spark.createDataFrame([], schema)
print(df.schema)
spark.stop()