Hail
raw JSON → 0.2.138 verified Sat May 09 auth: no python
Hail is an open-source, general-purpose, Python-based data analysis tool with additional data types and methods for working with genomic data. Current version 0.2.138, with monthly releases. Requires Python >=3.10.
pip install hail Common errors
error ModuleNotFoundError: No module named 'hail' ↓
cause Hail not installed or installed in wrong environment.
fix
Run
pip install hail in the correct Python environment (Python >=3.10). error ValueError: Hail is not initialized. Call hl.init() before using Hail methods. ↓
cause Forgot to call `hl.init()` after import.
fix
Add
hl.init() after import hail as hl. error java.lang.OutOfMemoryError: Java heap space ↓
cause Default Spark memory settings are insufficient for large datasets.
fix
Configure memory via
hl.init(spark_conf={'spark.executor.memory': '16g', 'spark.driver.memory': '16g'}). Warnings
breaking Hail removed support for Python 3.9 and below. Requires Python >=3.10 as of version 0.2.130. ↓
fix Upgrade to Python 3.10 or later.
deprecated The VCF import method `hl.import_vcf` is deprecated; use `hl.import_vcf_bgen` or `hail.VariantDataset.from_vcf` instead. ↓
fix Replace `hl.import_vcf` with `hl.import_vcf_bgen` or the new VDS API.
gotcha Hail uses lazy evaluation; mutations to MatrixTable require writing to disk or calling `.persist()` to enforce computation. ↓
fix Use `mt = mt.persist()` or `hl.write(mt, 'output.ht')` to trigger execution.
Imports
- hl wrong
from hail import *correctimport hail as hl - init wrong
hail.init()correcthl.init()
Quickstart
import hail as hl
hl.init()
mt = hl.balding_nichols_model(n_populations=3, n_samples=100, n_variants=10)
mt.show()