Matminer

0.10.0 · active · verified Tue Apr 14

Matminer is a Python library providing a comprehensive suite of tools for data mining in Materials Science. It offers functionalities for data loading, featurization of materials (compositions, structures), and integration with machine learning workflows. As of version 0.10.0, it is actively developed with regular updates to support new features and maintain compatibility with its dependencies.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to load a dataset from matminer, convert string representations of structures into pymatgen Structure objects, and apply a composition-based featurizer to enrich the DataFrame with new material properties.

import pandas as pd
from matminer.datasets.dataframe_loader import load_elastic_debye
from matminer.featurizers.conversions import StrToStructure
from matminer.featurizers.composition import ElementProperty

# 1. Load a sample dataset
df = load_elastic_debye()
print(f"Initial DataFrame shape: {df.shape}")
print(df.head())

# 2. Convert 'structure' string representation to pymatgen Structure objects
# Note: This step is crucial for structure-based featurizers
str_to_structure = StrToStructure()
df = str_to_structure.featurize_dataframe(df, 'structure')

# 3. Apply a composition featurizer
# (ElementProperty calculates statistics of elemental properties for each composition)
ep_featurizer = ElementProperty(
    features=['atomic_radius', 'electronegativity'],
    stats=['mean', 'std_dev']
)
df = ep_featurizer.featurize_dataframe(df, 'composition', ignore_errors=True)

print(f"\nDataFrame after featurization shape: {df.shape}")
print(df[['formula', 'composition', 'ElementProperty mean atomic_radius', 'ElementProperty std_dev electronegativity']].head())

view raw JSON →