cyvcf2: Fast VCF Parsing with Cython + HTSlib

0.32.1 · active · verified Thu Apr 16

cyvcf2 is a Python library providing a fast Cython wrapper for HTSlib, specifically designed for efficient parsing, querying, and limited modification of VCF (Variant Call Format) and BCF files. It offers a Python-friendly interface to access genetic variation data, supporting quick iteration through variants, extraction of diverse variant attributes, and manipulation of INFO and FORMAT fields. The library is highly optimized for performance, making it suitable for processing large genomic datasets. [1, 3, 4, 8]

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to open a VCF file, iterate through variants, and access common variant attributes like chromosome, position, reference, alternate alleles, INFO fields, and genotype information. It also includes creating a simple dummy VCF for immediate execution.

import os
from cyvcf2 import VCF

# Create a dummy VCF file for demonstration if it doesn't exist
vcf_path = 'example.vcf'
if not os.path.exists(vcf_path):
    with open(vcf_path, 'w') as f:
        f.write('##fileformat=VCFv4.2\n')
        f.write('##CHROM=<ID=1,length=10000>\n')
        f.write('##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">\n')
        f.write('##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">\n')
        f.write('#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\tSAMPLE1\tSAMPLE2\n')
        f.write('1\t100\trs1\tA\tT\t50\tPASS\tDP=100\tGT\t0/1\t1/1\n')
        f.write('1\t200\trs2\tC\tG,T\t90\tPASS\tDP=150\tGT\t0/0\t0/1\n')

try:
    vcf = VCF(vcf_path)
    for variant in vcf:
        print(f"CHROM: {variant.CHROM}, POS: {variant.POS}, REF: {variant.REF}, ALT: {variant.ALT}")
        print(f"  ID: {variant.ID}, QUAL: {variant.QUAL}, FILTER: {variant.FILTER}")
        print(f"  INFO DP: {variant.INFO.get('DP')}")
        # gt_types: 0=HOM_REF, 1=HET, 2=UNKNOWN, 3=HOM_ALT
        print(f"  Genotypes (types): {variant.gt_types}")
        print(f"  Reference depths: {variant.gt_ref_depths}")
        print(f"  Alternate depths: {variant.gt_alt_depths}")
    vcf.close()
except Exception as e:
    print(f"Error processing VCF: {e}")
    print("Please ensure 'example.vcf' is a valid VCF file and indexed if doing region queries.")

view raw JSON →