{"id":8056,"library":"cyvcf2","title":"cyvcf2: Fast VCF Parsing with Cython + HTSlib","description":"cyvcf2 is a Python library providing a fast Cython wrapper for HTSlib, specifically designed for efficient parsing, querying, and limited modification of VCF (Variant Call Format) and BCF files. It offers a Python-friendly interface to access genetic variation data, supporting quick iteration through variants, extraction of diverse variant attributes, and manipulation of INFO and FORMAT fields. The library is highly optimized for performance, making it suitable for processing large genomic datasets. [1, 3, 4, 8]","status":"active","version":"0.32.1","language":"en","source_language":"en","source_url":"https://github.com/brentp/cyvcf2/","tags":["genomics","bioinformatics","VCF","BCF","variant calling","hts","cython"],"install":[{"cmd":"pip install cyvcf2","lang":"bash","label":"PyPI"},{"cmd":"conda install -c bioconda cyvcf2","lang":"bash","label":"Bioconda"}],"dependencies":[{"reason":"Required for efficient handling of genotype and depth arrays returned by variant objects.","package":"numpy","optional":false}],"imports":[{"note":"The primary class for reading VCF/BCF files.","symbol":"VCF","correct":"from cyvcf2 import VCF"},{"note":"Needed for writing or modifying VCF/BCF files.","symbol":"Writer","correct":"from cyvcf2 import VCF, Writer"}],"quickstart":{"code":"import os\nfrom cyvcf2 import VCF\n\n# Create a dummy VCF file for demonstration if it doesn't exist\nvcf_path = 'example.vcf'\nif not os.path.exists(vcf_path):\n    with open(vcf_path, 'w') as f:\n        f.write('##fileformat=VCFv4.2\\n')\n        f.write('##CHROM=<ID=1,length=10000>\\n')\n        f.write('##INFO=<ID=DP,Number=1,Type=Integer,Description=\"Total Depth\">\\n')\n        f.write('##FORMAT=<ID=GT,Number=1,Type=String,Description=\"Genotype\">\\n')\n        f.write('#CHROM\\tPOS\\tID\\tREF\\tALT\\tQUAL\\tFILTER\\tINFO\\tFORMAT\\tSAMPLE1\\tSAMPLE2\\n')\n        f.write('1\\t100\\trs1\\tA\\tT\\t50\\tPASS\\tDP=100\\tGT\\t0/1\\t1/1\\n')\n        f.write('1\\t200\\trs2\\tC\\tG,T\\t90\\tPASS\\tDP=150\\tGT\\t0/0\\t0/1\\n')\n\ntry:\n    vcf = VCF(vcf_path)\n    for variant in vcf:\n        print(f\"CHROM: {variant.CHROM}, POS: {variant.POS}, REF: {variant.REF}, ALT: {variant.ALT}\")\n        print(f\"  ID: {variant.ID}, QUAL: {variant.QUAL}, FILTER: {variant.FILTER}\")\n        print(f\"  INFO DP: {variant.INFO.get('DP')}\")\n        # gt_types: 0=HOM_REF, 1=HET, 2=UNKNOWN, 3=HOM_ALT\n        print(f\"  Genotypes (types): {variant.gt_types}\")\n        print(f\"  Reference depths: {variant.gt_ref_depths}\")\n        print(f\"  Alternate depths: {variant.gt_alt_depths}\")\n    vcf.close()\nexcept Exception as e:\n    print(f\"Error processing VCF: {e}\")\n    print(\"Please ensure 'example.vcf' is a valid VCF file and indexed if doing region queries.\")","lang":"python","description":"This quickstart demonstrates how to open a VCF file, iterate through variants, and access common variant attributes like chromosome, position, reference, alternate alleles, INFO fields, and genotype information. It also includes creating a simple dummy VCF for immediate execution."},"warnings":[{"fix":"Ensure your locally installed htslib (if not using pre-built wheels) matches the requirement for your cyvcf2 version. For pip installations, pre-built wheels usually handle this, but source builds or specific environments might need manual intervention.","message":"HTSlib version compatibility changed significantly. cyvcf2 versions < 0.20.0 require htslib < 1.10, while cyvcf2 versions >= 0.20.0 require htslib >= 1.10. Installing with an incompatible htslib version will lead to build or runtime errors. [3]","severity":"breaking","affected_versions":"< 0.20.0, >= 0.20.0"},{"fix":"To persist the data, create a copy of the array using `numpy.array()`: `my_copy = numpy.array(variant.gt_ref_depths)`.","message":"Numpy arrays returned by `variant.gt_types`, `variant.gt_ref_depths`, etc., are views into the underlying C data structure. These arrays become invalid (containing 'nonsense' data) once the `variant` object goes out of scope. [3]","severity":"gotcha","affected_versions":"All versions"},{"fix":"Avoid using non-ASCII characters in FORMAT fields or string-typed FORMAT fields with `Number > 1` when writing VCFs with cyvcf2. Consider post-processing with other tools if these are strict requirements.","message":"cyvcf2 does not support writing VCFs with UTF-8 encoded, non-ASCII characters in string-typed FORMAT fields, nor does it support writing string type FORMAT fields with `Number` greater than 1. [1, 15]","severity":"gotcha","affected_versions":"All versions"},{"fix":"To treat partially missing genotypes as UNKNOWN, enable the `strict_gt` flag when initializing the VCF object, if such an option is available in your `cyvcf2` version or consider explicit post-processing of `gt_types`.","message":"By default, cyvcf2 classifies partially missing genotypes (e.g., `0/.`, `./1`) as heterozygous (HET). This can be inconsistent with how some other tools might interpret them (e.g., UNKNOWN). [1]","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Ensure your VCF files are properly encoded. If working with older Python versions, ensure locale settings are correct. Check for non-standard characters in your VCF. For some older versions, the `v.INFO` keys might be bytes in Python 3, requiring explicit decoding (e.g., `key.decode('utf-8')`).","cause":"This often occurs when VCF files contain non-ASCII characters in fields that cyvcf2 tries to interpret as ASCII, especially with older Python 3 environments or specific system locales. It can also happen with corrupted or malformed VCF entries. [12]","error":"UnicodeDecodeError: 'ascii' codec can't decode byte 0x81 in position 1: ordinal not in range(128)"},{"fix":"Verify installation with `pip list | grep cyvcf2`. Check for any local files or directories named `cyvcf2.py` or `cyvcf2` that might shadow the installed package. If compiling from source, ensure HTSlib and its development headers are available on your system.","cause":"This typically means cyvcf2 was not installed correctly, or there's a naming conflict with another `cyvcf2.py` file or directory in your Python path, preventing the actual library from being loaded. It can also occur if the installation failed due to missing C dependencies (like htslib).","error":"ImportError: cannot import name 'VCF' from 'cyvcf2'"},{"fix":"Install the required Visual C++ Build Tools (e.g., 'Desktop development with C++' workload in Visual Studio Installer). Consider using `conda` for easier installation on Windows, as `bioconda` often provides pre-compiled binaries that bypass local compilation challenges. Alternatively, use a Linux environment or WSL.","cause":"Installation on Windows, especially for Python versions 3.7 and above, often requires specific Visual C++ Build Tools (MSVC v14.0 or newer) and can encounter compatibility issues with Cython-generated code due to changes in Python's internal APIs. [14]","error":"Can't install from source / compile errors (e.g., on Windows with Python 3.7+)"},{"fix":"Add the bioconda channel to your conda configuration: `conda config --add channels bioconda` and then `conda install cyvcf2`. Ensure `conda-forge` is also enabled: `conda config --add channels conda-forge`.","cause":"The default conda channels do not contain `cyvcf2`. It is primarily hosted on the `bioconda` channel. [16]","error":"PackagesNotFoundError: The following packages are not available from current channels: - cyvcf2 (when using `conda install`)"}]}