{"id":7139,"library":"datazets","title":"Datazets","description":"Datazets is a Python package designed to easily import a collection of well-known example data sets, often used for machine learning, data analysis, and educational purposes. It provides a simple API to access these datasets without needing to manually download or preprocess them. The current version is 1.1.3, released on June 21, 2025, and it appears to have an active release cadence.","status":"active","version":"1.1.3","language":"en","source_language":"en","source_url":"https://github.com/erdogant/datazets","tags":["data","datasets","example-data","machine-learning","data-analysis"],"install":[{"cmd":"pip install datazets","lang":"bash","label":"Install latest version"}],"dependencies":[],"imports":[{"symbol":"datazets","correct":"import datazets as dz"}],"quickstart":{"code":"import datazets as dz\n\n# Load a well-known dataset, e.g., 'titanic'\ndf = dz.get('titanic')\n\nprint(f\"Dataset 'titanic' loaded with shape: {df.shape}\")\nprint(df.head())","lang":"python","description":"This quickstart demonstrates how to import the `datazets` library and load the 'titanic' dataset into a pandas DataFrame using the `get()` function."},"warnings":[{"fix":"Always inspect the loaded dataset for quality issues using methods like `df.info()`, `df.isnull().sum()`, `df.describe()`, and apply appropriate data cleaning and preprocessing techniques (e.g., using pandas, scikit-learn).","message":"Datazets provides raw example datasets, which may contain missing values, inconsistent formats, or require additional preprocessing steps (e.g., handling categorical variables, scaling numerical features) before being suitable for machine learning models or advanced analysis. Users should not assume the data is 'production-ready' out-of-the-box.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Refer to the official Datazets GitHub repository or documentation for a list of available dataset names (e.g., 'titanic', 'iris', 'boston'). Ensure the dataset name string is spelled correctly and matches one of the supported identifiers.","message":"The `dz.get()` function requires an exact string name for the desired dataset. If an incorrect or non-existent dataset name is provided, the function will raise an error. The library does not provide a built-in method to list all available dataset names directly via its API, requiring users to consult documentation or the source code for a comprehensive list.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Run `pip install datazets` to install the library.","cause":"The 'datazets' package is not installed in the current Python environment.","error":"ModuleNotFoundError: No module named 'datazets'"},{"fix":"Check the spelling of the dataset name. Consult the `datazets` documentation or GitHub README for a list of valid dataset names (e.g., `dz.get('titanic')`, not `dz.get('tianic')`).","cause":"The dataset name provided to `dz.get()` does not correspond to an available dataset within the library.","error":"ValueError: Selected data is not found!"}]}