Matminer
Matminer is a Python library providing a comprehensive suite of tools for data mining in Materials Science. It offers functionalities for data loading, featurization of materials (compositions, structures), and integration with machine learning workflows. As of version 0.10.0, it is actively developed with regular updates to support new features and maintain compatibility with its dependencies.
Warnings
- breaking In `matminer v0.10.0`, the `impute_nan` parameter for many featurizers changed its default value from `False` to `True`. This means missing features (NaNs) will now be automatically imputed as the mean of the column by default, potentially altering previous model training behaviors if not explicitly set to `False`.
- gotcha Matminer has strict dependency requirements, particularly for `pandas`, `numpy`, and `pymatgen`. Incompatibilities can arise with newer versions of these upstream libraries, leading to import errors or unexpected behavior. This was particularly noted around `pandas v2` and specific `pymatgen` versions.
- gotcha When using `BaseFeaturizer.set_n_jobs()` for parallel processing, especially with large datasets, there is a risk of Out-of-Memory (OOM) errors. This is due to Python's multiprocessing overhead and potential data duplication across processes if not managed carefully (e.g., with a Dask client).
- deprecated In `matminer v0.10.0`, the `ChemEnvSiteFingerprint.from_preset()` method had some 'not-implemented' Chemical Environments (CEs) removed. While these were technically non-functional, their removal might affect code that expected a specific set of presets or relied on the method's previous behavior.
Install
-
pip install matminer
Imports
- load_elastic_debye
from matminer.datasets.dataframe_loader import load_elastic_debye
- StrToStructure
from matminer.featurizers.conversions import StrToStructure
- ElementProperty
from matminer.featurizers.composition import ElementProperty
- SiteStatsFingerprint
from matminer.featurizers.structure import SiteStatsFingerprint
Quickstart
import pandas as pd
from matminer.datasets.dataframe_loader import load_elastic_debye
from matminer.featurizers.conversions import StrToStructure
from matminer.featurizers.composition import ElementProperty
# 1. Load a sample dataset
df = load_elastic_debye()
print(f"Initial DataFrame shape: {df.shape}")
print(df.head())
# 2. Convert 'structure' string representation to pymatgen Structure objects
# Note: This step is crucial for structure-based featurizers
str_to_structure = StrToStructure()
df = str_to_structure.featurize_dataframe(df, 'structure')
# 3. Apply a composition featurizer
# (ElementProperty calculates statistics of elemental properties for each composition)
ep_featurizer = ElementProperty(
features=['atomic_radius', 'electronegativity'],
stats=['mean', 'std_dev']
)
df = ep_featurizer.featurize_dataframe(df, 'composition', ignore_errors=True)
print(f"\nDataFrame after featurization shape: {df.shape}")
print(df[['formula', 'composition', 'ElementProperty mean atomic_radius', 'ElementProperty std_dev electronegativity']].head())