{"id":2979,"library":"koalas","title":"Koalas: pandas API on Apache Spark","description":"Koalas provides a pandas-compatible API that runs on Apache Spark, allowing users familiar with pandas to work with large, distributed datasets. The current version is 1.8.2. Its development as a standalone library has ceased, as its functionality has been officially integrated into PySpark as 'pandas API on Spark' starting with Apache Spark 3.2. Maintenance releases are infrequent, primarily addressing critical bug fixes.","status":"deprecated","version":"1.8.2","language":"en","source_language":"en","source_url":"https://github.com/databricks/koalas","tags":["apache-spark","pandas","dataframe","big-data","distributed-computing"],"install":[{"cmd":"pip install koalas","lang":"bash","label":"Install Koalas"}],"dependencies":[{"reason":"Required for Koalas to function, as it runs on Apache Spark.","package":"pyspark","optional":false},{"reason":"Provides the API Koalas implements; ensures compatibility with pandas data structures and operations.","package":"pandas","optional":false}],"imports":[{"note":"The official package namespace is `databricks.koalas`.","wrong":"import koalas as ks","symbol":"Koalas DataFrame/Series","correct":"import databricks.koalas as ks"},{"note":"While Koalas mimics pandas, it is a separate library and should be imported under its own alias, typically `ks`.","wrong":"import pandas as ks","symbol":"Koalas DataFrame/Series (from pandas)","correct":"import databricks.koalas as ks\nimport pandas as pd"}],"quickstart":{"code":"import databricks.koalas as ks\nimport pandas as pd\n\n# Create a pandas DataFrame\npdf = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})\n\n# Convert to a Koalas DataFrame\nkdf = ks.DataFrame(pdf)\n\nprint(\"Koalas DataFrame head:\")\nprint(kdf.head())\n\nprint(\"Mean of column 'A':\", kdf['A'].mean())\n\n# Convert back to a pandas DataFrame\npdf_result = kdf.to_pandas()\nprint(\"\\nConverted back to pandas DataFrame:\")\nprint(pdf_result)","lang":"python","description":"This quickstart demonstrates how to create a Koalas DataFrame from a pandas DataFrame, perform a basic operation (calculate mean), and convert it back to a pandas DataFrame. Ensure you have PySpark configured in your environment for this to run against a Spark session."},"warnings":[{"fix":"For Apache Spark 3.2 and later, use `import pyspark.pandas as ps` instead of `import databricks.koalas as ks`. Migrate existing Koalas code to use `pyspark.pandas`.","message":"Koalas as a standalone library is deprecated. All its functionality has been officially integrated into PySpark as 'pandas API on Spark' starting with Apache Spark 3.2. Users are strongly advised to migrate to PySpark directly.","severity":"breaking","affected_versions":"All versions, especially 1.8.0 and above. Affects users of Apache Spark 3.2+."},{"fix":"Review existing plotting code. If you prefer Matplotlib, you might need to explicitly set the backend, e.g., `ks.set_option('plotting.backend', 'matplotlib')` or adapt to Plotly's capabilities.","message":"The default plotting backend for Koalas switched from Matplotlib to Plotly in version 1.7.0. This can change the visual output and require different plotting options.","severity":"breaking","affected_versions":"1.7.0 and later"},{"fix":"Ensure your code explicitly handles Series names if consistency is critical, or upgrade to Koalas 1.2.0+ for pandas-like unnamed Series behavior.","message":"Koalas historically had different behavior than pandas regarding unnamed Series. Prior to v1.2.0, Koalas would automatically name a Series '0' if no name was specified, unlike pandas which allows a truly unnamed Series. This was fixed in v1.2.0 to align with pandas.","severity":"gotcha","affected_versions":"Prior to 1.2.0"},{"fix":"Always check release notes for specific pandas version compatibility. Keep Koalas updated to the latest available version if you are using recent pandas versions.","message":"Compatibility with specific pandas versions can introduce subtle bugs. For example, Koalas 1.8.2 addressed an issue with `_builtin_table` import in `groupby.apply` that affected pandas versions 1.3.0 and above.","severity":"gotcha","affected_versions":"May vary depending on pandas version. Specifically, pandas >=1.3.0 with Koalas <1.8.2."},{"fix":"Upgrade to Koalas 1.5.0 or later for improved Index operation support, or refactor complex index manipulations to simpler steps in older versions.","message":"Early versions of Koalas (pre-1.5.0) had limited or inconsistent support for complex Index operations (e.g., chained arithmetic operations), sometimes raising `AssertionError`.","severity":"gotcha","affected_versions":"Prior to 1.5.0"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}