pybedtools
pybedtools wraps and extends BEDTools and offers feature-level manipulations from within Python. It allows for genomic interval manipulation, also known as 'genome algebra'. The current version is 0.12.0, and it generally follows the release cadence of BEDTools, with major updates happening periodically.
Common errors
-
ImportError: cannot import name scripts
cause This typically occurs when using an Anaconda environment where another package or a user-created script is named 'scripts', shadowing the `pybedtools.scripts` module.fixChange your import statement to `import pybedtools.scripts` instead of `from pybedtools import scripts`. Alternatively, ensure no other module or script in your Python path is named 'scripts.py'. -
IOError: [Errno 24] Too many open files
cause Too many `BedTool` objects were created, exhausting the operating system's file handle limit. This often happens in loops.fixRefactor your code to minimize the number of `BedTool` objects created simultaneously. Utilize chaining of methods, `BedTool.filter()`, or `BedTool.each()` to process data efficiently without creating excessive temporary files. Use `pybedtools.cleanup()` if necessary to force deletion of temporary files. -
pybedtools.helpers.MalformedBedLineError: Malformed BED line:
cause The input BED file contains lines that do not conform to the BED format specification (e.g., start coordinate is greater than end, incorrect number of fields, non-tab-delimited fields).fixInspect the problematic lines in your input file. Ensure `start <= end` and all fields are tab-delimited. Use `pybedtools.remove_invalid()` to attempt to clean the file or manually correct the lines. -
command not found: bedtools
cause The underlying BEDTools executable is not found in the system's PATH. pybedtools acts as a wrapper and requires BEDTools to be installed separately.fixInstall BEDTools on your system (e.g., `conda install -c bioconda bedtools`). Verify that the `bedtools` command is accessible from your terminal by typing `bedtools --version`. -
g++: error: unrecognized command line option '-std=c++11'
cause This (or similar compilation errors) can occur during `pip install pybedtools` if a suitable C/C++ compiler is not found or is outdated, particularly when building Cython components.fixEnsure you have a C/C++ compiler installed (e.g., `build-essential` on Linux, Xcode on macOS). If using `pip`, consider installing via Conda (`conda install -c bioconda pybedtools`) which often handles compiler dependencies more robustly.
Warnings
- breaking Python 3.8 support was removed in pybedtools v0.11.0. Python 3.6 and 3.7 support was dropped in v0.9.1.
- gotcha Repeatedly creating `BedTool` objects, especially within loops, can lead to a 'Too many files open' error.
- gotcha pybedtools adheres to 0-based (BED) and 1-based (GFF) coordinate systems in the raw string output, but internally converts all `Interval` object start/stop attributes to 0-based for consistency.
- deprecated The `samtools` dependency was removed and replaced by `pysam` for BAM file handling.
- gotcha pybedtools relies on the underlying BEDTools executables. If BEDTools issues a warning (e.g., about malformed lines), pybedtools might raise an error and fail to create a `BedTool` object, even if BEDTools itself would produce output.
Install
-
pip install pybedtools -
conda install -c bioconda pybedtools
Imports
- BedTool
from pybedtools import BedTool
- scripts
from pybedtools import scripts
import pybedtools.scripts
Quickstart
import pybedtools
import os
# Create dummy files for demonstration
# In a real scenario, these would be your actual genomic files
with open('snps.bed', 'w') as f:
f.write('chr1\t10\t20\tSNP1\n')
f.write('chr1\t30\t40\tSNP2\n')
with open('exons.bed', 'w') as f:
f.write('chr1\t15\t25\tEXON1\n')
f.write('chr1\t35\t45\tEXON2\n')
# Create BedTool objects from files
snps = pybedtools.BedTool('snps.bed')
exons = pybedtools.BedTool('exons.bed')
# Perform an intersection and save the results
# This example saves a new BED file of intersections
# between snps.bed and exons.bed
intersected_bed = snps.intersect(exons)
output_filename = 'snps_in_exons.bed'
intersected_bed.saveas(output_filename, trackline="track name='SNPs in exons' color=128,0,0")
print(f"Intersection results saved to {output_filename}:")
with open(output_filename, 'r') as f:
print(f.read())
# Clean up dummy files
os.remove('snps.bed')
os.remove('exons.bed')
os.remove(output_filename)