Gemmi
Gemmi is a C++ library with comprehensive Python bindings designed for structural biology, particularly macromolecular crystallography. It provides tools for working with various file formats like mmCIF, PDB, MTZ, MRC/CCP4, and general CIF/STAR files, handling macromolecular models, refinement restraints, reflection data, and crystallographic symmetry. Currently at version 0.7.5, it is an actively developed open-source project maintained by CCP4 and Global Phasing Ltd.
Warnings
- breaking Gemmi 0.7+ migrated its Python bindings from pybind11 to nanobind. This is a significant change that may break compatibility for users who were previously interacting with the C++ library at a lower level or had custom extensions built against pybind11. Expect internal API changes related to the binding mechanism.
- breaking The behavior of `gemmi.Op` parsing (specifically `parse_triplet("h,k,l")` versus `parse_triplet("x,y,z")`) changed in Gemmi 0.7 to align with `cctbx` and `Pointless` conventions. This affects how crystallographic symmetry operations are interpreted and stored.
- breaking Several functions have been deprecated and subsequently removed in Gemmi 0.7+. Notable examples include `UnitCell.fractionalization_matrix` and `UnitCell.orthogonalization_matrix`, which should now be accessed via `frac.mat` and `orth.mat` respectively. `count_hydrogen_sites()` has also been removed, with `has_hydrogen()` as its replacement.
- gotcha The Python API documentation on `gemmi.readthedocs.io/en/latest/` has not been updated since Gemmi 0.6.7 due to the migration to nanobind. It is currently outdated and may not reflect the latest API changes or available functions in versions 0.7 and newer.
- gotcha There are two distinct packages: `gemmi` (the Python extension module) and `gemmi-program` (the command-line executable). Users sometimes confuse them or install the wrong one expecting the other's functionality.
Install
-
pip install gemmi
Imports
- gemmi
import gemmi
- Structure
import gemmi structure = gemmi.read_pdb('file.pdb')
Quickstart
import gemmi
import os
# Create a dummy PDB file for demonstration
pdb_content = """
ATOM 1 N ALA A 1 29.186 15.021 19.530 1.00 19.34 N
ATOM 2 CA ALA A 1 28.710 16.299 19.988 1.00 18.23 C
ATOM 3 C ALA A 1 27.241 16.353 20.306 1.00 17.51 C
ATOM 4 O ALA A 1 26.702 17.433 20.370 1.00 18.00 O
ATOM 5 CB ALA A 1 29.417 17.309 19.066 1.00 20.00 C
TER
"""
with open("test.pdb", "w") as f:
f.write(pdb_content)
# Load the PDB file
doc = gemmi.read_file("test.pdb")
# Access components of the structure
model = doc.models[0]
chain = model.chains[0]
residue = chain.residues[0]
print(f"File contains {len(doc.models)} model(s).")
print(f"First model has {len(model.chains)} chain(s).")
print(f"First chain has {len(chain.residues)} residue(s).")
print(f"First residue is {residue.name} {residue.seqid.num} and has {len(residue.atoms)} atoms.")
# Clean up the dummy file
os.remove("test.pdb")