MMTF-Python
MMTF-Python is a Python library for decoding, encoding, and working with the Macromolecular Transmission Format (MMTF), a binary encoding designed for efficient storage and transmission of biological structures. The current version is 1.1.3, released in July 2022. While the library is stable, the RCSB PDB no longer serves MMTF data by default, encouraging a switch to BinaryCIF (BCIF). Therefore, the project is considered to be in maintenance mode.
Common errors
-
ModuleNotFoundError: No module named 'mmtf'
cause The 'mmtf-python' library has not been installed or the Python environment where it's installed is not active.fixInstall the library using `pip install mmtf-python` or `conda install -c conda-forge mmtf-python`. -
TypeError: 'MMTFDecoder' object is not callable
cause Attempting to call `MMTFDecoder` directly as a function instead of importing and instantiating it, or more commonly, trying to call the `mmtf` module itself when meaning to use a function like `fetch`.fixUse the high-level `mmtf.fetch()` function for direct PDB ID retrieval, or correctly import and instantiate `MMTFDecoder` from `mmtf.decoder` if manual decoding is needed. Example: `from mmtf import fetch; data = fetch('1ABC')`. -
AttributeError: 'DecodedData' object has no attribute 'some_invalid_attribute'
cause Attempting to access an attribute (e.g., `structure_id`, `num_chains`) that either doesn't exist or is misspelled on the `DecodedData` object returned by `fetch`.fixConsult the `mmtf-python` documentation or the MMTF specification for the correct attribute names available in the decoded structure. Common attributes include `structure_id`, `num_chains`, `bio_assembly`, `group_list`. -
KeyError: 'some_missing_key'
cause Attempting to access a key within a dictionary-like structure (e.g., `group_list` elements) that is not present in the specific MMTF file's data.fixUse `.get(key, default_value)` when accessing dictionary keys to prevent `KeyError` if a key might be absent. Example: `group_name = decoded_data.group_list[0].get('groupName', 'Unknown')`. -
ImportError: cannot import name 'fetch' from 'mmtf'
cause This usually indicates that the 'fetch' function is not directly exposed at the top level of the 'mmtf' package, or there is a naming conflict/issue with the local environment.fixEnsure you are using `from mmtf import fetch`. If the error persists, check for any local files named 'mmtf.py' that might be shadowing the installed library. Verify the installed version against documentation.
Warnings
- breaking RCSB PDB ceased serving MMTF data by default as of July 2, 2024. Users relying on direct downloads from RCSB PDB in MMTF format will need to switch to BinaryCIF (BCIF) or find alternative MMTF data sources.
- gotcha The `mmtf-python` library has not had major feature updates since its 1.1.3 release in July 2022. While functional, it may encounter compatibility issues with very recent Python versions (e.g., Python 3.10+) or newer versions of its dependencies.
- gotcha When migrating code from Python 2 to Python 3, be aware of syntax changes, particularly `print` becoming a function (`print()`). Incorrect usage may lead to `SyntaxError` or `TypeError`.
- gotcha Older versions (prior to v1.0.10) had issues with leaking open file handles, which could lead to resource exhaustion in long-running applications or when processing many files.
Install
-
pip install mmtf-python -
conda install -c conda-forge mmtf-python
Imports
- fetch
from mmtf import fetch
- MMTFDecoder
from mmtf import MMTFDecoder
from mmtf.decoder import MMTFDecoder
Quickstart
from mmtf import fetch
# Get the data for a PDB structure (e.g., 4CUP)
decoded_data = fetch("4CUP")
print(f"PDB Code: {decoded_data.structure_id} has {decoded_data.num_chains} chains")
# Show the charge information for the first group
if decoded_data.group_list and decoded_data.group_list[0]:
group_name = decoded_data.group_list[0].get("groupName", "N/A")
charges = decoded_data.group_list[0].get("formalChargeList", [])
print(f"Group name: {group_name} has the following atomic charges: {','.join(map(str, charges))}")
# Show how many bioassemblies it has
print(f"PDB Code: {decoded_data.structure_id} has {len(decoded_data.bio_assembly)} bioassemblies")