Biopython
Biopython is a comprehensive collection of freely available Python tools for computational molecular biology and bioinformatics. It provides functionality for common tasks such as parsing various bioinformatics file formats (e.g., FASTA, GenBank, BLAST), interacting with online biological databases (e.g., NCBI Entrez), working with sequences and alignments, and structural bioinformatics. Currently at version 1.87, it is actively maintained with several releases per year.
Warnings
- breaking The `Bio.Alphabet` module was removed in Biopython 1.78 (September 2020). Explicit `alphabet` arguments for `Seq` objects are no longer supported, and molecule type is often inferred or stored as a string in `SeqRecord.annotations['molecule_type']`.
- breaking The `Bio.Fasta` module was deprecated in Biopython 1.51 (August 2009) and removed in Biopython 1.55 (August 2010).
- breaking Support for Python 2.7 was dropped in Biopython 1.77 (2020), in line with Python 2.7's end-of-life.
- deprecated The use of command-line tool wrappers in modules like `Bio.Applications` was deprecated in Biopython 1.78. They are no longer recommended due to potential security and compatibility issues.
- gotcha FASTA file parsing became stricter in Biopython 1.85, and as of 1.87, lines before the first '>' are no longer interpreted as comments but cause errors if they are not empty. This can break parsing of some non-standard FASTA files.
- deprecated The `.strand`, `.ref`, and `.ref_db` attributes of `SeqFeature` objects were temporarily removed in Biopython 1.82 without deprecation, then restored with deprecation warnings in 1.83. They are aliases for `.location.strand`, `.location.ref`, and `.location.ref_db` respectively.
- deprecated The `setup.py` script for project metadata and build configuration was deprecated in Biopython 1.87 in favor of a `pyproject.toml`-based setup.
Install
-
pip install biopython
Imports
- Seq
from Bio.Seq import Seq
- SeqIO
from Bio import SeqIO
- Align
from Bio import Align
- PDB
from Bio import PDB
- Entrez
from Bio import Entrez
- Alphabet
No direct equivalent
- Fasta
from Bio import SeqIO
Quickstart
import os
from Bio.Seq import Seq
from Bio import SeqIO
# 1. Working with a basic sequence
my_dna = Seq("ATGACGTACGT")
print(f"Original DNA: {my_dna}")
print(f"Complement: {my_dna.complement()}")
print(f"Reverse Complement: {my_dna.reverse_complement()}")
print(f"Translated protein: {my_dna.translate()}")
# 2. Parsing a FASTA file
# Create a dummy FASTA file for demonstration
fasta_content = (
">seq1 description for sequence 1\n"
"ATGCGTACGTAGCTAGCTAGCATGCAGCTAGCATGCGATGC\n"
">seq2 description for sequence 2\n"
"GATCGATCGATCGATCGATCGATCGATCGATCGATCGA"
)
with open("example.fasta", "w") as f:
f.write(fasta_content)
print("\n--- Parsing example.fasta ---")
for seq_record in SeqIO.parse("example.fasta", "fasta"):
print(f"ID: {seq_record.id}")
print(f"Description: {seq_record.description}")
print(f"Sequence: {seq_record.seq}")
print(f"Length: {len(seq_record.seq)}")
# Clean up the dummy file
os.remove("example.fasta")