SparkORM

raw JSON →
1.2.29 verified Sat May 09 auth: no python

SparkORM is a Python library for schema management and basic Object Relational Mapping for PySpark SQL and DataFrames. Current version: 1.2.29 (stable, monthly releases).

pip install sparkorm
error ModuleNotFoundError: No module named 'pyspark'
cause PySpark is not installed.
fix
pip install pyspark
error AttributeError: module 'sparkorm' has no attribute 'SparkSession'
cause Deprecated import path: SparkSession was renamed to SparkSessionSingleton in later versions.
fix
from sparkorm import SparkSessionSingleton
gotcha SparkSessionSingleton caches the session globally; calling get_or_create after stopping the session will create a new one. Ensure you do not accidentally reuse a stopped session.
fix Use SparkSessionSingleton().stop() carefully and recreate if needed.
gotcha SparkORM requires an existing PySpark installation. Installing sparkorm alone will not bring in PySpark; you must install it separately or via an extra dependency if available.
fix pip install pyspark
deprecated Some older tutorials use `from sparkorm import SparkSession` directly; this alias may be removed. Use `SparkSessionSingleton` instead.
fix from sparkorm import SparkSessionSingleton

Define a schema model, create a SparkDataFrame, and display it.

from sparkorm import SparkSessionSingleton, SparkDataFrame, Structure
from pyspark.sql.types import StructType, StructField, StringType, IntegerType

# Define schema model
class Employee(Structure):
    name: str
    age: int

# Get or create SparkSession
spark = SparkSessionSingleton().get_or_create()

# Create sample data
rows = [("Alice", 30), ("Bob", 25)]
schema = StructType([
    StructField("name", StringType(), True),
    StructField("age", IntegerType(), True)
])
df = spark.createDataFrame(rows, schema)

# Wrap with SparkDataFrame
sdf = SparkDataFrame(df, Employee)
sdf.show()