279T Poster - Population Genetics
Thursday June 09, 9:15 PM - 10:00 PM

An EM algorithm for detecting general diploid selection in time series allele frequency data


Authors:
Adam Fine 1; Matthias Steinruecken 2,3

Affiliations:
1) Graduate Program in Biophysical Sciences, University of Chicago, Chicago, IL; 2) Department of Ecology and Evolution, University of Chicago, Chicago, IL; 3) Department of Human Genetics, University of Chicago, Chicago, IL

Keywords:
Theory & Method Development

Detecting selection and quantifying its strength is a fundamental problem in evolutionary biology, with applications ranging from finding mutations critical to early hominid evolution to identifying genes responsible for rapid adaptation in experimental evolution. Since selection is a multi-generational process, many approaches for quantifying selection use genetic data sampled from two or more time points. A common class of models for analyzing such time series data are Hidden Markov models (HMMs), in which the underlying population allele frequency trajectory is modelled as a Wright-Fisher or Moran process, and the sampled allele frequencies are the observed variables. Mathieson and McVean (2013) introduced an expectation-maximization (EM) algorithm based on a maximum likelihood estimator for the haploid selection coefficient derived by Watterson (1982) that they use to efficiently estimate selection coefficients under genic selection. Here, we introduce DIESELFUEL (DIploid Estimates of SELection Forces Using Expectation-maximization and Lagrange multipliers), a novel method for estimating selection coefficients under general diploid selection scenarios. To this end, we derive a diploid version of Watterson's maximum likelihood estimators and extend the EM algorithm of Mathieson and McVean to infer general diploid selection. Moreover, we introduce an approach based on Lagrange multipliers to efficiently explore subspaces of the full parameter space. We use this approach to introduce a new framework for choosing the best model across both one-parameter (genic, dominance, recessive, over-/under-dominance) and full diploid selection scenarios. We also explore introducing a simplified version of Mathieson and McVean’s spatially structured population model, in which all migration rates are identical. On simulated data, we find that DIESELFUEL accurately and efficiently estimates selection coefficients and chooses the correct mode of selection across a range of parameter values and scenarios. Lastly, we apply DIESELFUEL to the human lactase and MHC loci and characterize selection of these loci more generally than previously possible to showcase the potential of our method.