Bgen

raw JSON →
1.9.9 verified Fri May 01 auth: no python

Python package for loading and manipulating data from BGEN files, a binary file format for storing genotype data. Current version is 1.9.9, compatible with Python >=3.8. It is actively maintained with periodic releases.

pip install bgen
error ImportError: No module named 'bgen'
cause Package not installed or installed in a different environment.
fix
Run 'pip install bgen' and ensure you're using the correct Python interpreter.
error bgen.BGENFile' object has no attribute 'read'
cause Using an old API; BGENFile no longer has a read method; data is accessed via iteration or properties.
fix
Use iteration: for variant in bgen: ... or bgen[0] to access first variant.
error KeyError: 'rsid'
cause Trying to access variant metadata using key-based access on a variant object, but variant is not a dict.
fix
Use dot notation: variant.rsid, variant.chromosome, etc.
gotcha BGENFile indexing is 0-based, not 1-based. Often users expect variant positions to be 1-indexed.
fix Always check documentation or use .rsid for identifiers.
deprecated Functions like 'bgen_open' are deprecated in favor of direct class instantiation.
fix Use 'from bgen import BGENFile; bgen = BGENFile(filename)' instead.
gotcha Memory usage can be high when calling .genotype() on a large variant because it loads all sample data into memory.
fix Use the 'subsample' or 'probs' arguments to subset or use iterator for large files.

Open a BGEN file and read basic information and variant genotypes.

from bgen import BGENFile

# Replace with your actual BGEN file path
bgen = BGENFile('example.bgen')

# Get number of samples
print('Number of samples:', bgen.nsamples)

# Get number of variants
print('Number of variants:', bgen.nvariants)

# Iterate over first 5 variants
for i, variant in enumerate(bgen):
    if i >= 5:
        break
    print('Variant:', variant.rsid, variant.chromosome, variant.position)
    # Access genotype probabilities (3D array: samples x alleles x ploidy)
    probs = variant.genotype()
    print('Genotype probs shape:', probs.shape)