sodapy: Python client for Socrata Open Data API
sodapy is a Python client for the Socrata Open Data API (SODA), enabling programmatic access to datasets from Socrata-powered platforms. While the library is functional, it has been unmaintained since August 31, 2022, with no new features or bug fixes planned. The current version is 2.2.0, and it is compatible with Python 3.5-3.10.
Warnings
- breaking The `sodapy` library is officially unmaintained as of August 31, 2022. No new features or bug fixes will be added. While existing functionality still works, users should proceed with caution and consider the lack of ongoing support for new projects.
- gotcha Queries executed without an application token will be subjected to strict throttling limits by the Socrata API. This can lead to slower responses or request failures for extensive data retrieval.
- gotcha The Socrata Open Data API (SODA) and `sodapy` are primarily for reading and directly writing to datasets. For write operations that involve data transformations or use the Socrata Data Management Experience (e.g., creating datasets through the UI), the Socrata Data Management API should be used.
- gotcha Socrata API calls have a default timeout of 10 seconds. For large datasets or slow connections, this can result in `Readtimeout error` exceptions, preventing full data retrieval.
- gotcha SODA APIs are paged and typically return a maximum of 50,000 records per request. Without proper pagination, you might only retrieve a subset of the available data.
Install
-
pip install sodapy
Imports
- Socrata
from sodapy import Socrata
Quickstart
import os
from sodapy import Socrata
# Get credentials from environment variables or provide directly
APP_TOKEN = os.environ.get('SOCRATA_APP_TOKEN', None) # Recommended for higher rate limits
USERNAME = os.environ.get('SOCRATA_USERNAME', None) # Only required for creating/modifying data
PASSWORD = os.environ.get('SOCRATA_PASSWORD', None) # Only required for creating/modifying data
# Example: Connect to a public dataset (e.g., NYC Open Data - 311 Service Requests)
# Replace 'data.cityofnewyork.us' with your Socrata domain
# Replace 'erm2-nwe9' with your dataset identifier
domain = 'data.cityofnewyork.us'
dataset_identifier = 'erm2-nwe9'
with Socrata(domain, APP_TOKEN, username=USERNAME, password=PASSWORD) as client:
# Increase timeout for large datasets if needed
# client.timeout = 50
# Example: Retrieve the first 5 records
print(f"Retrieving the first 5 records from {dataset_identifier} on {domain}...")
results = client.get(dataset_identifier, limit=5)
# Results are returned as a list of dictionaries
for item in results:
print(item)
print(f"\nRetrieved {len(results)} records.")
# Example: Retrieve metadata for the dataset
print(f"\nRetrieving metadata for {dataset_identifier}...")
metadata = client.get_metadata(dataset_identifier)
print(f"Dataset Name: {metadata.get('name')}")
print(f"Description: {metadata.get('description', '')[:100]}...")