{"id":5992,"library":"matminer","title":"Matminer","description":"Matminer is a Python library providing a comprehensive suite of tools for data mining in Materials Science. It offers functionalities for data loading, featurization of materials (compositions, structures), and integration with machine learning workflows. As of version 0.10.0, it is actively developed with regular updates to support new features and maintain compatibility with its dependencies.","status":"active","version":"0.10.0","language":"en","source_language":"en","source_url":"https://github.com/hackingmaterials/matminer","tags":["materials science","data mining","featurization","pymatgen","materials informatics"],"install":[{"cmd":"pip install matminer","lang":"bash","label":"Latest stable release"}],"dependencies":[{"reason":"Core dependency for representing materials structures and compositions.","package":"pymatgen"},{"reason":"Used extensively for data manipulation, particularly DataFrames.","package":"pandas"},{"reason":"Fundamental library for numerical operations.","package":"numpy"}],"imports":[{"symbol":"load_elastic_debye","correct":"from matminer.datasets.dataframe_loader import load_elastic_debye"},{"symbol":"StrToStructure","correct":"from matminer.featurizers.conversions import StrToStructure"},{"symbol":"ElementProperty","correct":"from matminer.featurizers.composition import ElementProperty"},{"symbol":"SiteStatsFingerprint","correct":"from matminer.featurizers.structure import SiteStatsFingerprint"}],"quickstart":{"code":"import pandas as pd\nfrom matminer.datasets.dataframe_loader import load_elastic_debye\nfrom matminer.featurizers.conversions import StrToStructure\nfrom matminer.featurizers.composition import ElementProperty\n\n# 1. Load a sample dataset\ndf = load_elastic_debye()\nprint(f\"Initial DataFrame shape: {df.shape}\")\nprint(df.head())\n\n# 2. Convert 'structure' string representation to pymatgen Structure objects\n# Note: This step is crucial for structure-based featurizers\nstr_to_structure = StrToStructure()\ndf = str_to_structure.featurize_dataframe(df, 'structure')\n\n# 3. Apply a composition featurizer\n# (ElementProperty calculates statistics of elemental properties for each composition)\nep_featurizer = ElementProperty(\n    features=['atomic_radius', 'electronegativity'],\n    stats=['mean', 'std_dev']\n)\ndf = ep_featurizer.featurize_dataframe(df, 'composition', ignore_errors=True)\n\nprint(f\"\\nDataFrame after featurization shape: {df.shape}\")\nprint(df[['formula', 'composition', 'ElementProperty mean atomic_radius', 'ElementProperty std_dev electronegativity']].head())","lang":"python","description":"This quickstart demonstrates how to load a dataset from matminer, convert string representations of structures into pymatgen Structure objects, and apply a composition-based featurizer to enrich the DataFrame with new material properties."},"warnings":[{"fix":"If you require the old behavior (no imputation by default), explicitly set `impute_nan=False` when initializing your featurizers, e.g., `MyFeaturizer(impute_nan=False)`.","message":"In `matminer v0.10.0`, the `impute_nan` parameter for many featurizers changed its default value from `False` to `True`. This means missing features (NaNs) will now be automatically imputed as the mean of the column by default, potentially altering previous model training behaviors if not explicitly set to `False`.","severity":"breaking","affected_versions":">=0.10.0"},{"fix":"If encountering dependency issues, consider installing matminer in a fresh environment. Refer to the `requirements/*.txt` files on the GitHub repository for pinned dependency versions known to be compatible with your matminer version. Always check release notes for specific compatibility updates.","message":"Matminer has strict dependency requirements, particularly for `pandas`, `numpy`, and `pymatgen`. Incompatibilities can arise with newer versions of these upstream libraries, leading to import errors or unexpected behavior. This was particularly noted around `pandas v2` and specific `pymatgen` versions.","severity":"gotcha","affected_versions":"All versions, but particularly relevant for v0.9.x to v0.10.x"},{"fix":"For very large datasets, avoid setting `n_jobs` to a high number directly. Consider processing in chunks, using a Dask client for more robust distributed computing, or setting `n_jobs=1` to run in a single process if memory is a constraint.","message":"When using `BaseFeaturizer.set_n_jobs()` for parallel processing, especially with large datasets, there is a risk of Out-of-Memory (OOM) errors. This is due to Python's multiprocessing overhead and potential data duplication across processes if not managed carefully (e.g., with a Dask client).","severity":"gotcha","affected_versions":"All versions using multiprocessing, noted in >=0.9.0"},{"fix":"Review your usage of `ChemEnvSiteFingerprint.from_preset()`. Ensure that the specific Chemical Environments you intend to use are still available. If you were using any of the removed 'not-implemented' CEs, adapt your code to use the currently supported ones or implement custom CEs if needed.","message":"In `matminer v0.10.0`, the `ChemEnvSiteFingerprint.from_preset()` method had some 'not-implemented' Chemical Environments (CEs) removed. While these were technically non-functional, their removal might affect code that expected a specific set of presets or relied on the method's previous behavior.","severity":"deprecated","affected_versions":">=0.10.0"}],"env_vars":null,"last_verified":"2026-04-14T00:00:00.000Z","next_check":"2026-07-13T00:00:00.000Z","problems":[]}