SparkORM

1.2.29 verified Sat May 09 auth: no python

SparkORM is a Python library for schema management and basic Object Relational Mapping for PySpark SQL and DataFrames. Current version: 1.2.29 (stable, monthly releases).

pip install sparkorm

Common errors

error ModuleNotFoundError: No module named 'pyspark' ↓

cause PySpark is not installed.

fix

pip install pyspark

error AttributeError: module 'sparkorm' has no attribute 'SparkSession' ↓

cause Deprecated import path: SparkSession was renamed to SparkSessionSingleton in later versions.

fix

from sparkorm import SparkSessionSingleton

Warnings

gotcha SparkSessionSingleton caches the session globally; calling get_or_create after stopping the session will create a new one. Ensure you do not accidentally reuse a stopped session. ↓

fix Use SparkSessionSingleton().stop() carefully and recreate if needed.

gotcha SparkORM requires an existing PySpark installation. Installing sparkorm alone will not bring in PySpark; you must install it separately or via an extra dependency if available. ↓

fix pip install pyspark

deprecated Some older tutorials use `from sparkorm import SparkSession` directly; this alias may be removed. Use `SparkSessionSingleton` instead. ↓

fix from sparkorm import SparkSessionSingleton

Imports

SparkSessionSingleton
```
from sparkorm import SparkSessionSingleton
```
Standard way to get/create a SparkSession.
SparkDataFrame
```
from sparkorm import SparkDataFrame
```
Core class representing schema-aware DataFrame.
Structure
```
from sparkorm import Structure
```
Base class for defining schema models.

Quickstart

Define a schema model, create a SparkDataFrame, and display it.

from sparkorm import SparkSessionSingleton, SparkDataFrame, Structure
from pyspark.sql.types import StructType, StructField, StringType, IntegerType

# Define schema model
class Employee(Structure):
    name: str
    age: int

# Get or create SparkSession
spark = SparkSessionSingleton().get_or_create()

# Create sample data
rows = [("Alice", 30), ("Bob", 25)]
schema = StructType([
    StructField("name", StringType(), True),
    StructField("age", IntegerType(), True)
])
df = spark.createDataFrame(rows, schema)

# Wrap with SparkDataFrame
sdf = SparkDataFrame(df, Employee)
sdf.show()