pysparkling
raw JSON → 0.6.2 verified Sat May 09 auth: no python maintenance
Pure Python implementation of the Spark RDD interface, providing a lightweight alternative to PySpark for local or small-scale distributed computing. Current version: 0.6.2. Release cadence is sporadic, with the last release in 2020.
pip install pysparkling Common errors
error ModuleNotFoundError: No module named 'pyspark' ↓
cause Trying to import pysparkling as pyspark.
fix
Install pysparkling with 'pip install pysparkling' and import as 'import pysparkling'.
error AttributeError: module 'pysparkling' has no attribute 'SparkContext' ↓
cause pysparkling uses Context instead of SparkContext.
fix
Use 'from pysparkling import Context' and create 'sc = Context()'.
error ImportError: cannot import name 'SparkConf' ↓
cause pysparkling does not support SparkConf.
fix
Use pysparkling's Context without configuration, or pass options via keyword arguments if supported.
Warnings
deprecated pysparkling is no longer actively maintained. It may not work with newer Python versions or have security updates. ↓
fix Consider migrating to PySpark or Dask for production use.
gotcha pysparkling's API is similar but not identical to PySpark. Some methods may have different signatures or missing features. ↓
fix Check the official documentation for exact API differences.
gotcha pysparkling does not support distributed execution across multiple machines; it runs in a single process with simulated parallelism. ↓
fix Use PySpark or Dask for true distributed computing.
Imports
- pysparkling wrong
import pysparkcorrectimport pysparkling - Context wrong
from pyspark import SparkContextcorrectfrom pysparkling import Context
Quickstart
from pysparkling import Context
sc = Context()
rdd = sc.parallelize([1, 2, 3, 4, 5])
print(rdd.sum()) # 15
sc.stop()