DScribe: Machine Learning Descriptors for Atomistic Systems
DScribe is a Python package (current version 2.1.2) designed for generating fixed-size numerical fingerprints, known as descriptors, from atomic structures. These descriptors are crucial for various applications in materials science, including machine learning, visualization, and similarity analysis. The library maintains an active development status with regular updates, including new descriptors and derivative functionalities. [1, 2, 5]
Common errors
-
fatal error: pybind11/pybind11.h: No such file or directory
cause DScribe depends on C++ extensions compiled during installation, which require `pybind11` headers. Although specified, `pip` may not correctly find or install `pybind11` beforehand. [4]fixInstall `pybind11` explicitly before installing `dscribe`: `pip install pybind11` -
fatal error: Python.h: No such file or directory
cause The C/C++ extensions for DScribe require Python development headers (e.g., `Python.h`) which are missing on the system. [4]fixInstall the Python development package for your specific Python version. For example, on Ubuntu: `sudo apt install python3.x-dev` (replace `x` with your minor Python version). -
Installation errors on MacOS related to C++ compilation
cause Compiling C++ extensions on macOS requires Xcode Command Line Tools, which may not be installed. [4]fixInstall Xcode Command Line Tools: `xcode-select --install` -
TypeError: create() got an unexpected keyword argument 'positions'
cause This error occurs in DScribe versions 2.0.0 and newer when using the outdated `positions` argument with local descriptors. [1]fixUpdate your code to use the `centers` argument instead of `positions` (e.g., `descriptor.create(system, centers=...)`).
Warnings
- breaking In DScribe 2.0.0, the `positions` argument for local descriptors was renamed to `centers`. Code using `positions` will break. [1]
- breaking In DScribe 2.0.0, global descriptors (CoulombMatrix, EwaldSumMatrix, SineMatrix, MBTR, LMBTR) no longer support 'unflattened' outputs. All global descriptors now produce 1D flattened output and local descriptors produce 2D flattened output. [1]
- breaking In DScribe 2.0.0, several SOAP descriptor parameters were renamed: `rcut` -> `r_cut`, `nmax` -> `n_max`, `lmax` -> `l_max`. Similarly for EwaldSumMatrix: `rcut` -> `r_cut`, `gcut` -> `g_max`. [1]
- breaking In DScribe 2.1.0, the `crossover` parameter in SOAP has been removed. Its functionality is now controlled by the `compression` parameter. [1, 3]
- gotcha DScribe relies on the Atomic Simulation Environment (ASE) for handling atomic structures. Ensure your atomic structures are correctly represented as `ase.Atoms` objects. [5]
Install
-
pip install dscribe -
conda install -c conda-forge dscribe -
git clone https://github.com/SINGROUP/dscribe.git cd dscribe git submodule update --init pip install .
Imports
- SOAP
from dscribe.descriptors import SOAP
- CoulombMatrix
from dscribe.descriptors import CoulombMatrix
- ACSF
from dscribe.descriptors import ACSF
- MBTR
from dscribe.descriptors import MBTR
Quickstart
import numpy as np
from ase.build import molecule
from dscribe.descriptors import SOAP, CoulombMatrix
# Define atomic structures
samples = [molecule("H2O"), molecule("NO2"), molecule("CO2")]
# Setup CoulombMatrix descriptor
cm_desc = CoulombMatrix(n_atoms_max=3, permutation="sorted_l2")
# Setup SOAP descriptor (using modern parameter names and compression)
soap_desc = SOAP(species=["C", "H", "O", "N"], r_cut=5, n_max=8, l_max=6, compression="crossover")
# Create descriptors for a single system
water = samples[0]
coulomb_matrix_h2o = cm_desc.create(water)
soap_h2o = soap_desc.create(water, centers=[0])
print("Coulomb Matrix for H2O:\n", coulomb_matrix_h2o)
print("SOAP for Oxygen in H2O:\n", soap_h2o)
# Create descriptors for multiple systems (can be parallelized)
coulomb_matrices_all = cm_desc.create(samples, n_jobs=2)
oxygen_indices = [np.where(x.get_atomic_numbers() == 8)[0] for x in samples]
oxygen_soap_all = soap_desc.create(samples, oxygen_indices, n_jobs=2)
print("Coulomb Matrices for all samples shape:", coulomb_matrices_all.shape)
print("SOAP for Oxygen in all samples shape:", oxygen_soap_all.shape)
# Descriptors also allow calculating derivatives
der, des = soap_desc.derivatives(samples[0], return_descriptor=True)
print("SOAP derivatives shape:", der.shape)
print("SOAP descriptor from derivatives shape:", des.shape)