pybaseball

raw JSON →
2.2.7 verified Fri May 01 auth: no python

pybaseball is a Python library for retrieving and analyzing baseball data from sources like Baseball Savant, FanGraphs, and Baseball-Reference. Version 2.2.7 (current) fixes FanGraphs leaderboard URLs and adds new features like PitchingBot/Stuff+ stat enums and strike zone plotting. Release cadence is irregular, with minor patches every few months.

pip install pybaseball
error ValueError: URL does not return valid JSON
cause Statcast API endpoint temporarily down or changed.
fix
Wait and retry, or check the pybaseball GitHub issue tracker for known API issues.
error AttributeError: module 'pybaseball' has no attribute 'statcast'
cause Importing from submodule instead of top level, or very old version (<2.0.0).
fix
Use from pybaseball import statcast and upgrade to latest version.
error HTTPError: 429 Client Error: Too Many Requests
cause Exceeding rate limits on Baseball Savant or FanGraphs.
fix
Add delays between requests (time.sleep(1)). Or use cached data after first successful fetch.
error KeyError: 'events'
cause Statcast data column missing; schema changed.
fix
Check df.columns for available columns; the 'events' column may be named differently or absent for certain date ranges.
breaking Statcast data schema changes frequently: column names, data types, and null handling can change without notice. Always check the actual columns after fetching.
fix Inspect df.columns after fetching and handle missing/renamed columns gracefully.
breaking FanGraphs leaderboard URL changed in v2.2.6; older versions cannot retrieve FanGraphs data.
fix Upgrade to pybaseball>=2.2.6.
deprecated Python 3.6 support dropped in v2.2.5; 3.7 also dropped later.
fix Use Python 3.8+.
gotcha Caching is disabled by default, but enabling it can cause stale data if not cleared. Manual cache clearing is required.
fix Use cache.enable() then cache.disable() or delete cache files manually.
gotcha Web scraping can be unreliable: frequent HTTP errors (429, 503) and HTML structure changes may break scraping functions.
fix Wrap calls in retry logic; check GitHub issues for known outages.

Retrieve Statcast data for two days in May 2024. Cache is disabled by default, but if enabled, use cache.disable() to ensure fresh data.

from pybaseball import statcast
import pandas as pd
# Disable cache to avoid stale data
from pybaseball import cache
df = statcast(start_dt='2024-05-01', end_dt='2024-05-02')
print(df.head())