SparkORM
raw JSON → 1.2.29 verified Sat May 09 auth: no python
SparkORM is a Python library for schema management and basic Object Relational Mapping for PySpark SQL and DataFrames. Current version: 1.2.29 (stable, monthly releases).
pip install sparkorm Common errors
error ModuleNotFoundError: No module named 'pyspark' ↓
cause PySpark is not installed.
fix
pip install pyspark
error AttributeError: module 'sparkorm' has no attribute 'SparkSession' ↓
cause Deprecated import path: SparkSession was renamed to SparkSessionSingleton in later versions.
fix
from sparkorm import SparkSessionSingleton
Warnings
gotcha SparkSessionSingleton caches the session globally; calling get_or_create after stopping the session will create a new one. Ensure you do not accidentally reuse a stopped session. ↓
fix Use SparkSessionSingleton().stop() carefully and recreate if needed.
gotcha SparkORM requires an existing PySpark installation. Installing sparkorm alone will not bring in PySpark; you must install it separately or via an extra dependency if available. ↓
fix pip install pyspark
deprecated Some older tutorials use `from sparkorm import SparkSession` directly; this alias may be removed. Use `SparkSessionSingleton` instead. ↓
fix from sparkorm import SparkSessionSingleton
Imports
- SparkSessionSingleton
from sparkorm import SparkSessionSingleton - SparkDataFrame
from sparkorm import SparkDataFrame - Structure
from sparkorm import Structure
Quickstart
from sparkorm import SparkSessionSingleton, SparkDataFrame, Structure
from pyspark.sql.types import StructType, StructField, StringType, IntegerType
# Define schema model
class Employee(Structure):
name: str
age: int
# Get or create SparkSession
spark = SparkSessionSingleton().get_or_create()
# Create sample data
rows = [("Alice", 30), ("Bob", 25)]
schema = StructType([
StructField("name", StringType(), True),
StructField("age", IntegerType(), True)
])
df = spark.createDataFrame(rows, schema)
# Wrap with SparkDataFrame
sdf = SparkDataFrame(df, Employee)
sdf.show()