{"id":2693,"library":"pyhmmer","title":"pyhmmer: Python Interface to HMMER3","description":"pyhmmer provides high-performance Cython bindings and a Pythonic interface to the HMMER3 C library, enabling powerful sequence analysis using Hidden Markov Models. It is used for searching protein and nucleic acid sequence databases, identifying remote homologs, and building profile HMMs. The current stable version is 0.12.0, with a release cadence of several minor versions per year, indicating active development.","status":"active","version":"0.12.0","language":"en","source_language":"en","source_url":"https://github.com/althonos/pyhmmer","tags":["bioinformatics","HMM","sequence-analysis","HMMER","genomics","proteomics"],"install":[{"cmd":"pip install pyhmmer","lang":"bash","label":"Standard installation"},{"cmd":"conda install -c bioconda pyhmmer","lang":"bash","label":"Conda installation (recommended for managing HMMER3 dependency)"}],"dependencies":[],"imports":[{"symbol":"Alphabet","correct":"from pyhmmer.easel import Alphabet"},{"symbol":"Sequence","correct":"from pyhmmer.easel import Sequence"},{"symbol":"HMM","correct":"from pyhmmer.hmm import HMM"},{"symbol":"Pipeline","correct":"from pyhmmer.pipeliner import Pipeline"},{"symbol":"HMMFile","correct":"from pyhmmer.plan7 import HMMFile"}],"quickstart":{"code":"import pyhmmer\nfrom pyhmmer.easel import Alphabet, Sequence\nfrom pyhmmer.hmm import HMM\nfrom pyhmmer.pipeliner import Pipeline\n\n# 1. Define the alphabet for sequences and HMMs\nalphabet = Alphabet.amino()\n\n# 2. Create a simple HMM from a seed sequence (or load from .hmm file)\n# For a real application, you would typically load an HMM from a file\n# using `pyhmmer.plan7.HMMFile('your_file.hmm').read_one()`\nseed_sequence = Sequence(name=b\"seed_seq\", sequence=b\"AGILRVAG\")\nhmm = HMM.from_sequence(seed_sequence, alphabet)\nhmm.name = b\"my_simple_hmm\"\n\n# 3. Create target sequences to search against\ntarget_sequences = [\n    Sequence(name=b\"target1\", sequence=b\"AGILRVAGGPPPL\"),\n    Sequence(name=b\"target2\", sequence=b\"GPPPLGGAGILRV\"),\n    Sequence(name=b\"target3\", sequence=b\"XXXXXAGILRVXXXX\") # Contains mismatching chars\n]\n\n# 4. Initialize the HMMER pipeline\n# The pipeline manages memory and resources for the search process\npipeline = Pipeline(alphabet)\n\n# 5. Run the search: search the HMM against the target sequences\n# This method returns a pyhmmer.search.SearchResult object\nresults = pipeline.search_hmm(hmm, target_sequences)\n\n# 6. Process and print the results\nfound_hits = False\nfor hit in results.hits:\n    found_hits = True\n    print(f\"\\n--- Hit Found ---\")\n    print(f\"Query HMM: {hit.query_name.decode()}\")\n    print(f\"Target Sequence: {hit.target_name.decode()}\")\n    print(f\"  E-value: {hit.evalue:.2e}, Bit Score: {hit.score:.2f}\")\n    for dom in hit.domains:\n        print(f\"    Domain: Query {dom.query_start}-{dom.query_end} (HMM positions)\")\n        print(f\"            Target {dom.target_start}-{dom.target_end} (Sequence positions)\")\n\nif not found_hits:\n    print(\"No significant hits found for the HMM against target sequences.\")","lang":"python","description":"This quickstart demonstrates how to create a simple HMM from a sequence, define target sequences, and perform a basic HMMER search using `pyhmmer.pipeliner.Pipeline`. It then iterates through the search results to display hits and their associated domains. Note that sequence names and data must be bytes."},"warnings":[{"fix":"Ensure you have a C compiler and build essentials installed (e.g., `build-essential` on Debian/Ubuntu, Xcode Command Line Tools on macOS). For more robust HMMER3 dependency management, especially on Windows or complex environments, consider installing `pyhmmer` via Bioconda (`conda install -c bioconda pyhmmer`). If HMMER3 is installed manually, set the `HMMER_DIR` environment variable to its installation prefix before building pyhmmer.","message":"pyhmmer is a wrapper around the HMMER3 C library. While `pip install pyhmmer` attempts to download and compile HMMER3 from source, this process can fail if necessary system build tools (like a C compiler, e.g., gcc/clang, and development libraries) are not installed or correctly configured on your system.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always review the changelog and migration guides (if available) when upgrading `pyhmmer` across minor versions. Test your code thoroughly after any upgrade. Pin your `pyhmmer` version in `requirements.txt` to avoid unexpected breakage in production environments.","message":"As a 0.x series library, `pyhmmer`'s API is still evolving and is subject to changes between minor versions (e.g., 0.10.x to 0.11.x, or 0.11.x to 0.12.x). This can include renaming of classes, methods, changes in function signatures, or modifications to the structure of result objects, potentially breaking existing code without a deprecation warning.","severity":"breaking","affected_versions":"<0.12.0"},{"fix":"Monitor memory usage for your specific workloads. For very large datasets, consider processing sequences in batches rather than loading everything into memory at once. Ensure `pyhmmer.pipeliner.Pipeline` and other resource-heavy objects are properly scoped (e.g., within functions or with explicit `del` if not garbage collected quickly enough) to allow for memory cleanup.","message":"HMMER operations, especially with large HMM databases or extensive sequence queries, can be highly memory-intensive due to the underlying C library. Insufficient RAM can lead to crashes or degraded performance. While `pyhmmer` manages C memory, improper handling of objects or large batch sizes can exacerbate memory pressure.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Explicitly define and pass the correct `Alphabet` (e.g., `Alphabet.amino()`, `Alphabet.dna()`) to all relevant `pyhmmer` objects, ensuring consistency. Always encode sequence data and names to `bytes` before passing them to `pyhmmer.easel.Sequence` or similar constructors.","message":"The `pyhmmer` library operates with specific `Alphabet` types (amino, DNA, RNA). Mismatching the alphabet between an HMM and the sequences being searched will lead to incorrect results or runtime errors. Additionally, sequence data (`Sequence.sequence` and `Sequence.name`) must be provided as `bytes`, not strings.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-10T00:00:00.000Z","next_check":"2026-07-09T00:00:00.000Z"}