StatsCan Data Reader for Python
Stats-can is a Python library designed to easily read data from Statistics Canada into pandas DataFrames. It simplifies access to the StatsCan API, allowing users to fetch specific data cubes by ID and apply filters for dimensions like geography, time, and characteristics. The current version is 3.2.3, with major releases refactoring the API for robustness, and minor releases addressing bug fixes and performance improvements.
Common errors
-
AttributeError: 'Statscan' object has no attribute 'get_cansim_table'
cause Attempting to use a method from the pre-v3.0 API on a v3.x `Statscan` object. The method `get_cansim_table` (and similar direct functions) no longer exist in v3.x.fixUpdate your code to use the new v3.x `Statscan` class methods, such as `sc.get_data("CANSIM_ID")` or `sc.get_table_data("table_id")`. Refer to the latest documentation on GitHub. -
ModuleNotFoundError: No module named 'stats_can'
cause Incorrect import statement. The Python module name is `statscan`, not `stats_can` (which mirrors the PyPI package name `stats-can`).fixChange your import statement from `import stats_can` or `from stats_can import ...` to `from statscan import Statscan`. -
ValueError: No data found for specified cube id and filters
cause The Statistics Canada API did not return any data for the provided cube ID and/or the applied filters. This could mean the cube ID is invalid, the specific combination of dimensions does not exist, or the data is not available for the requested time period.fixDouble-check the cube ID on the Statistics Canada website. Review the available dimensions and filter options for that cube. Try fetching the data without filters first to ensure the base cube is accessible, then gradually add filters.
Warnings
- breaking Version 3.0.0 introduced a complete API rewrite. All methods from previous versions (v2.x) are removed. Users must now instantiate the `Statscan` class and use its methods (e.g., `get_data`, `get_table_data`).
- gotcha Repeatedly fetching table metadata can be slow and may hit API rate limits. The library defaults to caching table metadata, but you can explicitly configure `tables_dir` and `cache_dir` for persistent caching and improved performance.
- gotcha Data availability from Statistics Canada is dynamic. Specific cube IDs, dimension combinations, or date ranges may not exist or return empty results, leading to empty DataFrames or `ValueError` exceptions.
Install
-
pip install stats-can
Imports
- Statscan
import stats_can
from statscan import Statscan
Quickstart
from statscan import Statscan
import pandas as pd
# Initialize the Statscan client
sc = Statscan()
# Fetch data for a specific cube ID (e.g., '17-10-0007-01' for Consumer Price Index)
df = sc.get_data("17-10-0007-01")
# Print the first few rows of the DataFrame
print(df.head())
# You can also specify filters, e.g., for specific geographies or dates
# df_filtered = sc.get_data(
# "17-10-0007-01",
# filters={
# 'GEO': ['Canada', 'Ontario'],
# 'REF_DATE': ['2023-01', '2023-02']
# }
# )
# print(df_filtered.tail())