Pyogrio: Vectorized Spatial Vector File I/O
Pyogrio provides fast, bulk-oriented read and write access to GDAL/OGR vector data sources such as ESRI Shapefile, GeoPackage, GeoJSON, and FlatGeobuf. It significantly optimizes performance for geospatial data operations by leveraging pre-compiled GDAL/OGR bindings, thereby minimizing Python data type conversions. The library is actively maintained, with frequent releases that may include breaking changes between major minor versions. The current stable version is 0.12.1.
Warnings
- breaking In version 0.12.0, the `read_dataframe` function started returning JSON fields as Python `dict`s or `list`s, whereas they were previously returned as raw strings. This change affects how JSON-type columns are handled programmatically.
- breaking As of version 0.12.0, Pyogrio dropped official support for GDAL versions 3.4 and 3.5. Users are now required to use GDAL 3.6 or newer.
- gotcha Pyogrio's binary wheels often bundle a specific GDAL version, which might conflict with a system-installed GDAL or one linked by other geospatial libraries like Fiona. This can lead to `DriverError` or `NullPointerError`.
- gotcha In `read_dataframe`, datetime columns with mixed time zone offsets are converted to UTC by default. This might alter original offset information if not explicitly handled.
- gotcha MacOS binary wheels for Pyogrio are built specifically for macOS 12 and newer. Users on older macOS versions (e.g., macOS 11 or earlier) will need to build Pyogrio from source, which requires a pre-installed GDAL development environment.
- gotcha Utilizing the `use_arrow=True` option in `read_dataframe` or `write_dataframe` for performance benefits requires the `pyarrow` library to be installed. Additionally, writing with Arrow (`use_arrow=True` in `write_dataframe`) specifically requires GDAL >= 3.8, while reading with Arrow requires GDAL >= 3.6.
- gotcha Pyogrio does not perform validation of attribute values or geometry types before attempting to write data to an output file. Providing invalid types may result in crashes during the write operation with cryptic error messages.
Install
-
pip install pyogrio -
conda install -c conda-forge pyogrio
Imports
- read_dataframe
from pyogrio import read_dataframe
- write_dataframe
from pyogrio import write_dataframe
- list_drivers
from pyogrio import list_drivers
- read_info
from pyogrio import read_info
Quickstart
import geopandas as gpd
from pyogrio import read_dataframe, write_dataframe
from shapely.geometry import Point
import os
# Create a dummy GeoDataFrame
data = {'name': ['Location A', 'Location B'], 'value': [10, 20]}
geometry = [Point(1, 1), Point(2, 2)]
gdf = gpd.GeoDataFrame(data, geometry=geometry, crs="EPSG:4326")
# Define output path
output_file = "my_geodata.gpkg"
# Write GeoDataFrame to a GeoPackage file
print(f"Writing data to {output_file}...")
write_dataframe(gdf, output_file, driver="GPKG", layer="my_points")
print("Write complete.")
# Read GeoDataFrame from the GeoPackage file
print(f"Reading data from {output_file}...")
read_gdf = read_dataframe(output_file, layer="my_points")
print("Read complete.")
print(read_gdf)
# Clean up the created file
# os.remove(output_file)
# print(f"Cleaned up {output_file}.")