Pastrami: a fast and efficient algorithm for fine-scale genetic ancestry inference.
Conley Andrew B, AB Rishishwar, Lavanya L et al.
Visual Summary
A visual snapshot of this publication's key findings at a glance
Publication Details
Comprehensive information about this research publication
Abstract
Summary of the research findings
Genomics research increasingly relies on large population biobanks that include many thousands of participants. However, current genetic ancestry inference methods are computationally inefficient and prohibitively slow when applied to such large cohorts. The aim of this work was to develop a fast and efficient algorithm for fine-scale genetic ancestry inference on biobank-size cohorts. The Pastrami algorithm that we developed performs supervised genetic ancestry inference by comparing haplotypes between query and global reference samples, creating query and reference haplotype copying vectors, and relating them via non-negative least squares regression to estimate ancestry fractions. We used Pastrami for ancestry inference on genomic data sets from Africa, the Americas, and the United Kingdom, comparing its accuracy and runtime performance to the most widely used haplotype-based ancestry inference methods. Pastrami ancestry estimates are highly similar to estimates from the ChromoPainter and RFMix programs. The total CPU time required by Pastrami increases linearly with the number of samples, and it achieves ∼45× faster runtime than ChromoPainter. When run on 488 377 UK Biobank and 3433 reference samples, Pastrami used 2340 CPU hours compared to ∼105 000 CPU hours for ChromoPainter. The Pastrami program and documentation are made freely available on GitHub: https://github.com/healthdisparities/pastrami.
Listen to This Research
A two-host conversation exploring the key findings of this publication
Pastrami: a fast and efficient algorithm for fine-scale genetic ancestry inference.
Two-Host ConversationAnalysis
Comprehensive review of ancestry and genetic findings
Important Disclaimer: This review has been performed semi-automatically and is provided for informational purposes only. While we strive for accuracy, this analysis may contain errors, omissions, or misinterpretations of the original research. DNA Genics disclaims all liability for any inaccuracies, errors, or consequences arising from the use of this information. Users should independently verify all information and consult original research publications before making any decisions based on this content. This analysis is not intended as a substitute for professional scientific review or medical advice.