Datashader
Datashader is a Python library designed for high-performance visualization of very large datasets. It uses GPU-accelerated techniques to aggregate data into a grid, enabling effective rendering of billions of data points. The current version is 0.19.0, and it maintains an active release cadence, often aligning with the broader HoloViz ecosystem.
Warnings
- breaking Python 3.9 support was officially dropped in v0.18.0. Furthermore, v0.17.0 increased the minimum supported Python version to 3.10.
- breaking The Datashader command-line interface (CLI) was removed in v0.19.0. Functionality previously available via the CLI must now be accessed programmatically.
- gotcha Since v0.17.0, `Pillow` (for image output) and `Dask` (for large datasets) are optional dependencies. They are no longer automatically installed with `pip install datashader`.
- gotcha Prior to v0.16.0, GeoPandas GeoDataFrames often required conversion to SpatialPandas before use with Datashader. V0.16.0 introduced direct support for many GeoPandas geometry types (e.g., LineString, Polygon) in `Canvas` functions, simplifying geospatial workflows.
- gotcha A bug causing a segmentation fault during `quadmesh` reduction, specifically when array sizes were exceeded, was present in versions up to 0.18.1 and fixed in v0.18.2.
Install
-
pip install datashader -
pip install datashader[dask,geopandas,cudf]
Imports
- datashader
import datashader as ds
- transfer_functions
import datashader.transfer_functions as tf
Quickstart
import datashader as ds
import datashader.transfer_functions as tf
import pandas as pd
import numpy as np
from PIL import Image
# 1. Generate some example data
num_points = 100_000
data = pd.DataFrame({
'x': np.random.normal(0, 1, num_points),
'y': np.random.normal(0, 1, num_points),
'value': np.random.rand(num_points) # For coloring
})
# 2. Create a Canvas to define the aggregation grid
canvas = ds.Canvas(plot_width=400, plot_height=400)
# 3. Aggregate the data using the mean of 'value'
agg = canvas.points(data, 'x', 'y', agg=ds.mean('value'))
# 4. Shade the aggregated data into an image
img = tf.shade(agg, cmap=['lightblue', 'darkblue'], how='linear')
# To make it runnable and confirm output, convert to PIL Image object
pil_img = img.to_pil()
assert isinstance(pil_img, Image.Image)
# In a real application, you would typically save it or display it
# using a visualization library like HoloViews or Bokeh.
# pil_img.save("datashader_quickstart.png")