yallHap: Modern Y-chromosome haplogroup inference with probabilistic scoring and ancient DNA support
Alaina Hardie
Publication Details
Comprehensive information about this research publication
Abstract
Summary of the research findings
The human Y chromosome enables detailed reconstruction of paternal lineages through haplogroup classification. Existing tools for this purpose typically rely on outdated phylogenies, lack ancient DNA handling, or provide limited confidence metrics. Here I present yallHap, a Y-chromosome haplogroup classifier that integrates the YFull phylogenetic tree (185,780 SNPs) with probabilistic scoring, built-in ancient DNA damage filtering, and parallel processing for population-scale studies. Validation on 1,231 high-coverage gnomAD samples achieved 99.9% accuracy (95% CI: 99.5–100%) on GRCh38, and 1,233 samples from 1000 Genomes Phase 3 achieved 99.8% accuracy (95% CI: 99.3–100%). For ancient DNA with moderate variant density (4–10%), Bayesian ancient mode achieves +19.3 pp improvement over heuristic mode (+12 to +24 pp at 1% increments; see Supplementary Table S3), reaching 60–86% accuracy. On full AADR ancient DNA validation (7,333 samples spanning ∼45,000 years), this translates to 90.7% overall accuracy (95% CI: 90.0–91.3%) versus 88.3% for heuristic transversions-only mode. At variant densities ≥10%, both modes reach 97–99% accuracy. yallHap supports multiple reference genomes (GRCh37, GRCh38, T2T-CHM13v2.0), provides detailed quality metrics including optional ISOGG nomenclature output, and offers multi-threaded batch processing for large-scale studies. The tool is designed for integration into modern bioinformatics pipelines, with example wrappers for nf-core/eager [16,17] and Snakemake [18] workflows. The software is open source, available at https://github.com/trianglegrrl/yallHap, and distributed via pip, Bioconda, and Docker.
Analysis
Comprehensive review of ancestry and genetic findings
Important Disclaimer: This review has been performed semi-automatically and is provided for informational purposes only. While we strive for accuracy, this analysis may contain errors, omissions, or misinterpretations of the original research. DNA Genics disclaims all liability for any inaccuracies, errors, or consequences arising from the use of this information. Users should independently verify all information and consult original research publications before making any decisions based on this content. This analysis is not intended as a substitute for professional scientific review or medical advice.