mmCIF Core Access Library
The `mmcif` Python library, current version 1.1.0, provides a comprehensive API for interacting with macromolecular Crystallographic Information File (mmCIF) and BinaryCIF data. Developed by the RCSB PDB, it includes native Python functionality and leverages pybind11 wrappers for accelerated I/O operations from a C++ core library. It is designed for reading, manipulating, and exporting structural biology data in mmCIF format and is actively maintained with updates.
Warnings
- gotcha Multiple Python libraries exist for mmCIF parsing (e.g., `PDBeCIF`, `BioPython.PDB.MMCIFParser`, `mmcif-pdbx`, `python-modelcif`). Ensure you are explicitly using the `rcsb/py-mmcif` library (installed as `mmcif` via pip) as its API may differ from others.
- breaking The `rcsb/py-mmcif` project's internal versioning (e.g., `0.x.y` for GitHub commits) may not directly correspond to the PyPI `mmcif` package version. Historically, there have been updates to BinaryCIF handling (e.g., changing data item storage from tuples to lists) for compatibility, which could affect code relying on specific data types.
- deprecated The `mmcif-pdbx` library, while related, notes that versions after `0.*` break API compatibility by renaming methods to conform to PEP8. While `rcsb/py-mmcif` is the canonical library, users migrating from `mmcif-pdbx` may encounter API differences.
Install
-
pip install mmcif
Imports
- MarshalUtil
from rcsb.utils.io.MarshalUtil import MarshalUtil
Quickstart
import os
from rcsb.utils.io.MarshalUtil import MarshalUtil
# Create a MarshalUtil instance for I/O operations
mU = MarshalUtil()
# Define a public mmCIF file URL (e.g., from RCSB PDB)
mmcif_url = "https://files.rcsb.org/download/1ema.cif"
# Load data from the URL. The library can handle both local paths and URLs.
dataContainerList = mU.load(mmcif_url, contentType="mmcif")
if dataContainerList:
# An mmCIF file can contain multiple data blocks; typically, we access the first one
dataContainer = dataContainerList[0]
print(f"Data block ID: {dataContainer.getName()}")
# Access a specific data category, e.g., '_entity'
entity_category = dataContainer.getObj("entity")
if entity_category:
print(f"\nFound {entity_category.getRowCount()} entities:")
for i in range(entity_category.getRowCount()):
pdbx_description = entity_category.getValue("pdbx_description", i)
type_val = entity_category.getValue("type", i)
print(f" - Entity {i+1}: Type='{type_val}', Description='{pdbx_description}'")
else:
print("'_entity' category not found in the mmCIF file.")
else:
print(f"Failed to load data from {mmcif_url}. Please check the URL and network connection.")