283T Poster - Population Genetics
Thursday June 09, 9:15 PM - 10:00 PM

Statistical inference in population genomics


Authors:
Parul Johri 1; Charles Aquadro 2; Mark Beaumont 3; Brian Charlesworth 4; Laurent Excoffier 5; Adam Eyre-Walker 6; Peter Keightley 4; Michael Lynch 1; Gil McVean 7; Bret Payseur 8; Susanne Pfeifer 1; Wolfgang Stephan 9; Jeffrey Jensen 1

Affiliations:
1) School of Life Sciences, Arizona State University, Tempe, AZ, US; 2) Department of Molecular Biology and Genetics, Cornell University, Ithaca, US; 3) School of Biological Sciences, University of Bristol, Bristol, UK; 4) Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, UK; 5) Institute of Ecology and Evolution, University of Berne, Berne, CH; 6) School of Life Sciences, University of Sussex, Brighton, UK; 7) Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK; 8) Laboratory of Genetics, University of Wisconsin-Madison, Madison, US; 9) Leibniz-Institute for Evolution and Biodiversity Science, Berlin, DE

Keywords:
Theory & Method Development

The field of population genomics has grown rapidly with the recent advent of affordable, large scale sequencing technologies. As opposed to the situation during the majority of the 20th century, in which the development of theoretical and statistical population-genetic insights out-paced the generation of data to which they could be applied, genomic data are now being produced at a far greater rate than they can be meaningfully analyzed and interpreted. With this wealth of data has come a tendency to focus on fitting specific (and often rather idiosyncratic) models to data, at the expense of a careful exploration of the range of possible underlying evolutionary processes. For example, the approach of directly investigating models of adaptive evolution in each newly sequenced population or species often neglects the fact that a thorough characterization of ubiquitous non-adaptive processes is a prerequisite for accurate inference. We describe the perils of these tendencies and demonstrate how multiple incorrect models can often be fit equally well to population-genomic data. We also demonstrate how confounders such as background selection effects, unmodelled population history, mutation and recombination rate heterogeneity, SNP ascertainment bias, and progeny skew, when not accounted for, can bias the inference of both selection and demography. Thereby, we argue for the importance of defining a biologically relevant baseline model tuned to the details of each new analysis, of skepticism and scrutiny in interpreting model-fitting results, and of carefully defining addressable hypotheses and underlying uncertainties. Finally, I would like to close by addressing current best practices in population genomic data analysis and highlighting areas of statistical inference and theory that are in need of further attention.