pyhmmer: Python Interface to HMMER3

0.12.0 · active · verified Fri Apr 10

pyhmmer provides high-performance Cython bindings and a Pythonic interface to the HMMER3 C library, enabling powerful sequence analysis using Hidden Markov Models. It is used for searching protein and nucleic acid sequence databases, identifying remote homologs, and building profile HMMs. The current stable version is 0.12.0, with a release cadence of several minor versions per year, indicating active development.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to create a simple HMM from a sequence, define target sequences, and perform a basic HMMER search using `pyhmmer.pipeliner.Pipeline`. It then iterates through the search results to display hits and their associated domains. Note that sequence names and data must be bytes.

import pyhmmer
from pyhmmer.easel import Alphabet, Sequence
from pyhmmer.hmm import HMM
from pyhmmer.pipeliner import Pipeline

# 1. Define the alphabet for sequences and HMMs
alphabet = Alphabet.amino()

# 2. Create a simple HMM from a seed sequence (or load from .hmm file)
# For a real application, you would typically load an HMM from a file
# using `pyhmmer.plan7.HMMFile('your_file.hmm').read_one()`
seed_sequence = Sequence(name=b"seed_seq", sequence=b"AGILRVAG")
hmm = HMM.from_sequence(seed_sequence, alphabet)
hmm.name = b"my_simple_hmm"

# 3. Create target sequences to search against
target_sequences = [
    Sequence(name=b"target1", sequence=b"AGILRVAGGPPPL"),
    Sequence(name=b"target2", sequence=b"GPPPLGGAGILRV"),
    Sequence(name=b"target3", sequence=b"XXXXXAGILRVXXXX") # Contains mismatching chars
]

# 4. Initialize the HMMER pipeline
# The pipeline manages memory and resources for the search process
pipeline = Pipeline(alphabet)

# 5. Run the search: search the HMM against the target sequences
# This method returns a pyhmmer.search.SearchResult object
results = pipeline.search_hmm(hmm, target_sequences)

# 6. Process and print the results
found_hits = False
for hit in results.hits:
    found_hits = True
    print(f"\n--- Hit Found ---")
    print(f"Query HMM: {hit.query_name.decode()}")
    print(f"Target Sequence: {hit.target_name.decode()}")
    print(f"  E-value: {hit.evalue:.2e}, Bit Score: {hit.score:.2f}")
    for dom in hit.domains:
        print(f"    Domain: Query {dom.query_start}-{dom.query_end} (HMM positions)")
        print(f"            Target {dom.target_start}-{dom.target_end} (Sequence positions)")

if not found_hits:
    print("No significant hits found for the HMM against target sequences.")

view raw JSON →