{"id":5636,"library":"graphframes-py","title":"GraphFrames: DataFrame-based Graphs","description":"GraphFrames is a powerful library for graph processing built on Apache Spark DataFrames. It allows users to perform graph analytics, queries, and algorithms like PageRank, connected components, and shortest paths directly using Spark's DataFrame API. The project is actively maintained, with frequent releases. The official Python package on PyPI is `graphframes-py`, and its latest version is 0.11.0.","status":"active","version":"0.11.0","language":"en","source_language":"en","source_url":"https://github.com/graphframes/graphframes","tags":["spark","graph","dataframe","analytics","graph-algorithms"],"install":[{"cmd":"pip install graphframes-py","lang":"bash","label":"Install Python package"},{"cmd":"spark-submit --packages org.graphframes:graphframes:0.11.0-spark3.5-s_2.12 your_app.py","lang":"bash","label":"Run with Spark (example for Spark 3.5, Scala 2.12)"}],"dependencies":[{"reason":"GraphFrames is built on Apache Spark DataFrames, requiring PySpark for Python API interaction.","package":"pyspark"},{"reason":"The core GraphFrames library is a Scala/Java JAR which must be provided to the Spark runtime. The version must match your Spark and Scala versions.","package":"org.graphframes:graphframes:0.11.0-spark3.5-s_2.12","optional":false}],"imports":[{"note":"GraphFrames is an external library, not part of core PySpark.","wrong":"from pyspark.graphframes import GraphFrame","symbol":"GraphFrame","correct":"from graphframes import GraphFrame"}],"quickstart":{"code":"from pyspark.sql import SparkSession\nfrom pyspark.sql.functions import lit\nfrom graphframes import GraphFrame\n\n# Create a SparkSession with GraphFrames package\n# IMPORTANT: Replace '0.11.0-spark3.5-s_2.12' with the version compatible\n# with your Spark and Scala installation. Check GraphFrames docs for details.\nspark = SparkSession.builder \\\n    .appName(\"GraphFrames Quickstart\") \\\n    .config(\"spark.jars.packages\", \"org.graphframes:graphframes:0.11.0-spark3.5-s_2.12\") \\\n    .getOrCreate()\n\n# Create a Vertex DataFrame\nv = spark.createDataFrame([\n  (\"a\", \"Alice\", 34),\n  (\"b\", \"Bob\", 36),\n  (\"c\", \"Charlie\", 30),\n  (\"d\", \"David\", 29),\n  (\"e\", \"Esther\", 32),\n  (\"f\", \"Fanny\", 36)\n], [\"id\", \"name\", \"age\"])\n\n# Create an Edge DataFrame\ne = spark.createDataFrame([\n  (\"a\", \"b\", \"friend\"),\n  (\"b\", \"c\", \"follow\"),\n  (\"c\", \"b\", \"follow\"),\n  (\"f\", \"c\", \"follow\"),\n  (\"e\", \"f\", \"follow\"),\n  (\"e\", \"d\", \"friend\"),\n  (\"d\", \"a\", \"friend\")\n], [\"src\", \"dst\", \"relationship\"])\n\n# Create a GraphFrame\ng = GraphFrame(v, e)\n\n# Run PageRank algorithm\nresults = g.pagerank(resetProbability=0.15, maxIter=5)\nresults.vertices.select(\"id\", \"pagerank\").show()\n\nspark.stop()","lang":"python","description":"This quickstart demonstrates how to initialize a SparkSession with the GraphFrames package, create vertex and edge DataFrames, construct a GraphFrame, and run a PageRank algorithm. Ensure the `--packages` option uses the correct GraphFrames version for your Spark and Scala environment."},"warnings":[{"fix":"Always use `pip install graphframes-py`. Do not install `graphframes`.","message":"The official PyPI package name for GraphFrames changed from `graphframes` to `graphframes-py` starting with v0.9.0. The old `graphframes` package on PyPI is severely outdated (v0.6) and should not be used.","severity":"breaking","affected_versions":"0.9.0 and later"},{"fix":"Update your Spark configurations to use `org.graphframes:graphframes:...` instead of `graphframes:graphframes:...`.","message":"The Maven groupId for the GraphFrames JAR changed from `graphframes` to `io.graphframes` in v0.9.0. This affects how you specify the package with `--packages` in `spark-submit` or in `SparkSession` configurations.","severity":"breaking","affected_versions":"0.9.0 and later"},{"fix":"Ensure your Spark environment is configured to include the GraphFrames JAR. Refer to the 'install' and 'quickstart' examples for correct setup.","message":"GraphFrames is a Spark library and requires a running Spark cluster. Installing the Python package (`graphframes-py`) is not enough; you must also provide the GraphFrames JAR to your Spark runtime. This is typically done via `spark-submit --packages` or `SparkSession.builder.config(\"spark.jars.packages\", ...)`.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always check the official GraphFrames documentation for the correct JAR artifact name (e.g., `org.graphframes:graphframes:0.11.0-spark3.5-s_2.12`) that matches your Spark and Scala versions.","message":"GraphFrames has strict compatibility requirements with specific versions of Spark and Scala. Using an incompatible GraphFrames JAR version with your Spark distribution will lead to runtime errors.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Review the official GraphFrames documentation for v0.9.0+ when upgrading from older versions, especially if using advanced algorithms or custom Pregel implementations.","message":"Significant API updates occurred in v0.9.0, including changes to the Pregel API and internal implementations of algorithms like Connected Components (CC), Community Detection using Label Propagation (CDLP), and Shortest Paths (SP). Some GraphX-free implementations were introduced.","severity":"breaking","affected_versions":"0.9.0 and later"}],"env_vars":null,"last_verified":"2026-04-10T00:00:00.000Z","next_check":"2026-07-09T00:00:00.000Z"}