Vega Datasets
A Python package providing convenient access to a collection of over 70 datasets used in Vega, Vega-Lite, and Altair examples and documentation. As of Altair 6.0.0, the `vega-datasets` package has been archived, and its functionality, including all datasets, has been integrated directly into the `altair.datasets` module. The last standalone release of `vega-datasets` was 0.9.0.
Common errors
-
ModuleNotFoundError: No module named 'vega_datasets'
cause The `vega_datasets` package is not installed, or you are attempting to use the old import path (`from vega_datasets import data`) after migrating to Altair 6.0.0+ without installing the standalone `vega_datasets` package.fixIf working with older Altair versions, ensure `vega-datasets` is installed (`pip install vega-datasets`). If using Altair 6.0.0+, switch your import to `from altair.datasets import data` and ensure `altair` is installed (`pip install altair`). -
MaxRowsError: The number of rows in your dataset is greater than the maximum allowed (5000).
cause Altair's default data transformer limits the size of embedded datasets to prevent performance issues in visualizations.fixTo process larger datasets, enable the VegaFusion data transformer: `import altair as alt; alt.data_transformers.enable('vegafusion')`. This requires `vegafusion` to be installed (`pip install vegafusion`). For extremely large datasets, consider passing data by URL. -
Data not loading, chart appears empty or incomplete in browser or Jupyter notebook.
cause Browser ad-blockers, privacy extensions, or strict Content Security Policies (CSPs) might block requests to CDNs (e.g., `cdn.jsdelivr.net`) where some `vega-datasets` are hosted.fixTemporarily disable ad-blockers or privacy extensions for the specific page/domain. Check your browser's developer console (Network tab) for blocked requests or `ERR_BLOCKED_BY_CLIENT` messages. Consider using datasets bundled with Altair or serving them locally.
Warnings
- breaking The `vega-datasets` package has been archived and its functionality migrated to `altair.datasets` in Altair 6.0.0+. Direct `import vega_datasets` will no longer receive updates and is discouraged for new projects.
- deprecated The standalone `vega-datasets` PyPI package (version 0.9.0 and earlier) is deprecated. While it remains installable, new datasets and updates are only provided via `altair.datasets`.
- gotcha Dataset names containing hyphens (e.g., 'sf-temps') must be accessed using underscores (e.g., `data.sf_temps()`) when using the Python interface.
- gotcha When visualizing large datasets with Altair, you might encounter a `MaxRowsError`. This is a default safeguard in Altair to encourage efficient data handling.
Install
-
pip install vega-datasets -
pip install altair>=6.0.0
Imports
- data
from vega_datasets import data
from altair.datasets import data
Quickstart
# Old way (vega-datasets 0.9.0 and earlier)
# Requires: pip install vega-datasets
try:
from vega_datasets import data
cars_df_old = data.cars()
print("Old import (vega-datasets 0.9.0):\n", cars_df_old.head())
# Accessing metadata
# print(data.cars.description)
except ImportError:
print("vega-datasets not installed, skipping old import example.")
# New way (Altair 6.0.0+ with built-in datasets)
# Requires: pip install altair>=6.0.0
try:
from altair.datasets import data as altair_data
cars_df_new = altair_data.cars()
print("\nNew import (Altair 6.0.0+):\n", cars_df_new.head())
# Accessing metadata
# print(altair_data.cars.description)
except ImportError:
print("\nAltair 6.0.0+ not installed or older version. Cannot use new import example.")