Datazets
Datazets is a Python package designed to easily import a collection of well-known example data sets, often used for machine learning, data analysis, and educational purposes. It provides a simple API to access these datasets without needing to manually download or preprocess them. The current version is 1.1.3, released on June 21, 2025, and it appears to have an active release cadence.
Common errors
-
ModuleNotFoundError: No module named 'datazets'
cause The 'datazets' package is not installed in the current Python environment.fixRun `pip install datazets` to install the library. -
ValueError: Selected data is not found!
cause The dataset name provided to `dz.get()` does not correspond to an available dataset within the library.fixCheck the spelling of the dataset name. Consult the `datazets` documentation or GitHub README for a list of valid dataset names (e.g., `dz.get('titanic')`, not `dz.get('tianic')`).
Warnings
- gotcha Datazets provides raw example datasets, which may contain missing values, inconsistent formats, or require additional preprocessing steps (e.g., handling categorical variables, scaling numerical features) before being suitable for machine learning models or advanced analysis. Users should not assume the data is 'production-ready' out-of-the-box.
- gotcha The `dz.get()` function requires an exact string name for the desired dataset. If an incorrect or non-existent dataset name is provided, the function will raise an error. The library does not provide a built-in method to list all available dataset names directly via its API, requiring users to consult documentation or the source code for a comprehensive list.
Install
-
pip install datazets
Imports
- datazets
import datazets as dz
Quickstart
import datazets as dz
# Load a well-known dataset, e.g., 'titanic'
df = dz.get('titanic')
print(f"Dataset 'titanic' loaded with shape: {df.shape}")
print(df.head())