Biotite
Biotite is a comprehensive Python library (current version 1.6.0) for computational molecular biology, offering a broad set of tools for sequence analysis, structural bioinformatics, and accessing data from biological databases. It leverages NumPy arrays for efficient, high-performance operations and provides seamless interfaces to integrate with external bioinformatics software, allowing users to streamline their analyses from basic scripting to developing full software packages. The library maintains an active development and release schedule, with significant updates in recent years.
Warnings
- breaking As of Biotite v1.6.0, the `biotraj` package is now a mandatory dependency for trajectory file interfaces in `biotite.structure.io`, and `mdtraj` is no longer required for this purpose. Projects relying on `mdtraj` through Biotite's internal interfaces might require adjustment.
- gotcha Biotite internally stores most sequence and structure data as NumPy `ndarray` objects. While offering high performance and intuitive NumPy-like indexing, users accustomed to other bioinformatics libraries (e.g., Biopython) might need to adapt to this NumPy-centric data model.
- gotcha Biotite is organized into several subpackages (e.g., `biotite.sequence`, `biotite.structure`, `biotite.database`). Specific functionalities reside within these submodules, requiring explicit imports from the relevant subpackage rather than a single top-level `import biotite`.
- deprecated In `biotite.sequence.graphics`, the default color scheme for visualizing sequence alignments changed from `rainbow` to `flower` in v1.6.0. The `flower` scheme is considered to represent amino acid similarity more effectively.
Install
-
pip install biotite -
conda install -c conda-forge biotite
Imports
- ProteinSequence
from biotite.sequence import ProteinSequence
- entrez
from biotite.database import entrez
- FastaFile
from biotite.sequence.io.fasta import FastaFile
- align_optimal
from biotite.sequence.align import align_optimal
Quickstart
import biotite.sequence.align as align
import biotite.sequence.io.fasta as fasta
import biotite.database.entrez as entrez
import os
# Download FASTA file for the sequences of avidin and streptavidin
# The 'file_name' should ideally be a path to a temporary file.
# For a runnable example, we'll use a simple name and ensure cleanup in a real scenario.
file_name = "sequences.fasta"
uids = ["CAC34569", "ACL82594"] # Example UIDs for avidin and streptavidin
entrez.fetch_single_file(
uids=uids,
file_name=file_name,
db_name="protein",
ret_type="fasta"
)
# Parse the downloaded FASTA file and create 'ProteinSequence' objects
fasta_file = fasta.FastaFile.read(file_name)
avidin_seq, streptavidin_seq = fasta.get_sequences(fasta_file).values()
# Align sequences using the BLOSUM62 matrix with affine gap penalty
matrix = align.SubstitutionMatrix.std_protein_matrix()
alignments = align.align_optimal(
avidin_seq, streptavidin_seq, matrix,
gap_penalty=(-10, -1),
terminal_penalty=False
)
print(f"Number of alignments: {len(alignments)}")
if alignments:
print("First optimal alignment:")
print(alignments[0])
# Clean up the downloaded file
os.remove(file_name)