pyhmmer: Python Interface to HMMER3
pyhmmer provides high-performance Cython bindings and a Pythonic interface to the HMMER3 C library, enabling powerful sequence analysis using Hidden Markov Models. It is used for searching protein and nucleic acid sequence databases, identifying remote homologs, and building profile HMMs. The current stable version is 0.12.0, with a release cadence of several minor versions per year, indicating active development.
Warnings
- gotcha pyhmmer is a wrapper around the HMMER3 C library. While `pip install pyhmmer` attempts to download and compile HMMER3 from source, this process can fail if necessary system build tools (like a C compiler, e.g., gcc/clang, and development libraries) are not installed or correctly configured on your system.
- breaking As a 0.x series library, `pyhmmer`'s API is still evolving and is subject to changes between minor versions (e.g., 0.10.x to 0.11.x, or 0.11.x to 0.12.x). This can include renaming of classes, methods, changes in function signatures, or modifications to the structure of result objects, potentially breaking existing code without a deprecation warning.
- gotcha HMMER operations, especially with large HMM databases or extensive sequence queries, can be highly memory-intensive due to the underlying C library. Insufficient RAM can lead to crashes or degraded performance. While `pyhmmer` manages C memory, improper handling of objects or large batch sizes can exacerbate memory pressure.
- gotcha The `pyhmmer` library operates with specific `Alphabet` types (amino, DNA, RNA). Mismatching the alphabet between an HMM and the sequences being searched will lead to incorrect results or runtime errors. Additionally, sequence data (`Sequence.sequence` and `Sequence.name`) must be provided as `bytes`, not strings.
Install
-
pip install pyhmmer -
conda install -c bioconda pyhmmer
Imports
- Alphabet
from pyhmmer.easel import Alphabet
- Sequence
from pyhmmer.easel import Sequence
- HMM
from pyhmmer.hmm import HMM
- Pipeline
from pyhmmer.pipeliner import Pipeline
- HMMFile
from pyhmmer.plan7 import HMMFile
Quickstart
import pyhmmer
from pyhmmer.easel import Alphabet, Sequence
from pyhmmer.hmm import HMM
from pyhmmer.pipeliner import Pipeline
# 1. Define the alphabet for sequences and HMMs
alphabet = Alphabet.amino()
# 2. Create a simple HMM from a seed sequence (or load from .hmm file)
# For a real application, you would typically load an HMM from a file
# using `pyhmmer.plan7.HMMFile('your_file.hmm').read_one()`
seed_sequence = Sequence(name=b"seed_seq", sequence=b"AGILRVAG")
hmm = HMM.from_sequence(seed_sequence, alphabet)
hmm.name = b"my_simple_hmm"
# 3. Create target sequences to search against
target_sequences = [
Sequence(name=b"target1", sequence=b"AGILRVAGGPPPL"),
Sequence(name=b"target2", sequence=b"GPPPLGGAGILRV"),
Sequence(name=b"target3", sequence=b"XXXXXAGILRVXXXX") # Contains mismatching chars
]
# 4. Initialize the HMMER pipeline
# The pipeline manages memory and resources for the search process
pipeline = Pipeline(alphabet)
# 5. Run the search: search the HMM against the target sequences
# This method returns a pyhmmer.search.SearchResult object
results = pipeline.search_hmm(hmm, target_sequences)
# 6. Process and print the results
found_hits = False
for hit in results.hits:
found_hits = True
print(f"\n--- Hit Found ---")
print(f"Query HMM: {hit.query_name.decode()}")
print(f"Target Sequence: {hit.target_name.decode()}")
print(f" E-value: {hit.evalue:.2e}, Bit Score: {hit.score:.2f}")
for dom in hit.domains:
print(f" Domain: Query {dom.query_start}-{dom.query_end} (HMM positions)")
print(f" Target {dom.target_start}-{dom.target_end} (Sequence positions)")
if not found_hits:
print("No significant hits found for the HMM against target sequences.")