The missing data problem in population genomics and statistical methods to address them.
Sethuraman Arun
Publication Details
Comprehensive information about this research publication
Abstract
Summary of the research findings
The "Missing Data" problem is prevalent across all statistical inference, owing to the "absence of some part of a familiar data structure" (Efron 1994).Population genomic datasets are riddled with missing data (Fig. 1)-broadly classified as data missing at random (e.g.due to degradation, sequencing errors), data missing "on purpose" (e.g.due to sequencing strategies like genotyping by sequencing), and data missing due to unknown evolutionary history (e.g.introgression from ancestral ghost populations).Editors and scientific contributors to both the GSA's journals, Genetics and G3 have continually highlighted statistical issues and pitfalls with inference in the presence of missing data (McIntyre 2025), particularly in an age of Biobank scale population genomic datasets.Here I highlight studies, including those that have been recently published in Genetics and G3 towards systematically assessing the effects of missing data problems and addressing them towards inference in a variety of population genomics questions.
Analysis
Comprehensive review of ancestry and genetic findings
Important Disclaimer: This review has been performed semi-automatically and is provided for informational purposes only. While we strive for accuracy, this analysis may contain errors, omissions, or misinterpretations of the original research. DNA Genics disclaims all liability for any inaccuracies, errors, or consequences arising from the use of this information. Users should independently verify all information and consult original research publications before making any decisions based on this content. This analysis is not intended as a substitute for professional scientific review or medical advice.