{"id":3755,"library":"pyfaidx","title":"pyfaidx: Efficient FASTA Access","description":"pyfaidx is a Python library that provides efficient, pythonic random access to subsequences within FASTA files, compatible with samtools index format (.fai). It allows for fast retrieval and in-place modification without loading the entire file into memory. The current version is 0.9.0.4, with frequent minor updates and bug fixes.","status":"active","version":"0.9.0.4","language":"en","source_language":"en","source_url":"https://github.com/mdshw5/pyfaidx/","tags":["bioinformatics","fasta","genomics","sequence","data-access"],"install":[{"cmd":"pip install pyfaidx","lang":"bash","label":"Install latest version"}],"dependencies":[{"reason":"Used for version parsing and compatibility checks.","package":"packaging","optional":false}],"imports":[{"symbol":"Fasta","correct":"from pyfaidx import Fasta"}],"quickstart":{"code":"import os\nfrom pyfaidx import Fasta\n\n# Create a dummy FASTA file for demonstration\nfasta_content = (\n    \">chr1 description of chromosome 1\\n\"\n    \"ATGCGTACGTACGTACGTAGCTAGCTAGCTACGTAGCTACGTAGCTAGCTACGTACGT\\n\"\n    \"CGTAGCTACGTAGCTACGTAGCTACGTAGCTAGCTACGTAGCTACGTACGTAGCTACG\\n\"\n    \">chr2 description of chromosome 2\\n\"\n    \"GATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGA\\n\"\n    \"TTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATTACAGATT\\n\"\n)\n\nfile_path = \"example.fasta\"\nwith open(file_path, \"w\") as f:\n    f.write(fasta_content)\n\n# Open the FASTA file, an index (.fai) will be created if it doesn't exist\ngenes = Fasta(file_path)\n\n# Access a sequence by its header name (case-sensitive)\nchr1_sequence = genes['chr1']\nprint(f\"Chromosome 1 full length: {len(chr1_sequence)}\")\nprint(f\"First 10 bases of chr1: {chr1_sequence[:10]}\") # Pythonic 0-based slicing\n\n# Retrieve a subsequence using 1-based coordinates (like samtools faidx)\n# pyfaidx object slicing is 0-based, but Sequence object attributes are 1-based.\n# To get 1-based subsequence string, you'd typically use the string slice directly from the FastaRecord.\n# The example below shows how to get a 1-based range (e.g. for printing a 1-based output)\n# Note: Slicing `genes['chr1'][start_0_based:end_0_based]`\n# For 1-based '21-30', it means Python slice `[20:30]`\nsub_sequence_1_based = genes['chr1'][20:30] # This gets bases 21-30 (1-based)\nprint(f\"chr1 1-based coord 21-30: {sub_sequence_1_based.seq}\")\nprint(f\"  .start (1-based): {sub_sequence_1_based.start}\")\nprint(f\"  .end (1-based): {sub_sequence_1_based.end}\")\n\n# Get the reverse complement of a sequence\nrc_sequence = genes['chr2'][::-1].complement\nprint(f\"Reverse complement of chr2 start: {rc_sequence.seq[:20]}\")\n\n# Clean up the dummy file and its index\nos.remove(file_path)\nos.remove(file_path + \".fai\")\n","lang":"python","description":"This quickstart demonstrates how to initialize a `Fasta` object, access sequences by their header names, retrieve subsequences using slicing, and perform operations like reverse complementation. It also highlights the default 1-based indexing for sequence attributes, while slicing remains 0-based Pythonic."},"warnings":[{"fix":"Upgrade to pyfaidx version 0.9.0.3 or higher (`pip install --upgrade pyfaidx`).","message":"A bug in the new BGZF indexing strategy introduced in v0.9.0 affected versions up to v0.9.0.2. Users of pyfaidx v0.9.0, v0.9.0.1, or v0.9.0.2 should upgrade to v0.9.0.3 or later to avoid potential indexing issues with BGZF compressed FASTA files.","severity":"breaking","affected_versions":"0.9.0 - 0.9.0.2"},{"fix":"Be mindful of the coordinate system. For Pythonic 0-based slicing, use `fasta_obj['name'][0_based_start:0_based_end]`. If you need to work with 1-based coordinates for display or interoperability with tools like samtools, remember to adjust your slice indices accordingly (e.g., 1-based `start` corresponds to 0-based `start - 1`).","message":"pyfaidx uses 1-based (closed) coordinates for sequence attributes like `.start` and `.end` on `Sequence` objects, mirroring samtools faidx. However, Python's native slicing (`sequence[start:end]`) remains 0-based and half-open. This can be a common source of off-by-one errors if not carefully managed. You can initialize `Fasta(..., one_based_attributes=False)` to change the `.start/.end` attributes, but it won't affect slicing behavior.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure your FASTA files are properly formatted with consistent line lengths. Tools like `seqtk` or `bbtools reformat` can help standardize FASTA formats if you encounter this issue.","message":"FASTA files require consistent line lengths (apart from the last line of a sequence) for pyfaidx to correctly build an index and retrieve subsequences. Inconsistent line lengths can lead to errors during indexing or incorrect sequence retrieval.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Initialize `Fasta('your.fasta', read_long_names=True)` if full FASTA headers are required. For compressed files, consider decompressing or processing headers separately if full names are critical.","message":"By default, pyfaidx truncates sequence descriptions when indexing to keep names concise. If you need to access the full FASTA header (including the description) for each sequence, you must initialize the `Fasta` object with `read_long_names=True`. This option only works with uncompressed FASTA files.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}