53 Oral - Platform Session #6 Theory and Methods
Friday June 10, 9:00 AM - 9:15 AM

Improving the estimation of DFE using paired allele frequency and allele age information


Authors:
Vivaswat Shastry; Jeremy Berg

Affiliation: University of Chicago, Chicago, IL

Keywords:
Natural selection

Estimating the fitness effect of de-novo mutations is an important problem in population and evolutionary genetics. The accurate estimation of the distribution of fitness effects (DFE) is crucial in understanding a broad variety of processes, from selection shaping genetic diversity in natural populations to the evolution of complex traits in the human population. Broadly, the DFE can be inferred through two approaches: experimentally, through mutagenesis or mutation-accumulation and computationally, via analysis of existing variation in natural populations. We focus on the latter approach in which the site frequency spectrum (SFS) at potentially functional sites is used to infer the DFE for deleterious and nearly neutral mutations. However, the allele frequency alone at an individual site contains relatively little information about its selection coefficient, limiting the utility of the SFS as a source of information about the DFE. Here, we propose incorporating more information about the trajectory of these deleterious mutations into our estimation by jointly modeling the frequency and age of deleterious mutations. We compute the likelihood for a certain selection coefficient given the paired data of allele frequency and age by evolving the SFS forward-in-time (starting with a de-novo mutant) across a range of parameters. This iterative algorithm is based on a close approximation to the Wright-Fisher model with additive selection previously proposed by Jouganous et. al., 2017, and as a result, can incorporate the effect due to complex demographic histories. Using simulated data, we show that, for the same number of sites, using allele age jointly with allele frequency allows us to achieve higher levels of accuracy in estimating the selection coefficient compared to using the SFS alone. With the development of methods that construct genome-wide genealogies, we can calculate the allele age for mutations fairly quickly and accurately from large empirical data sets. Finally, we hope to use this light-weight and flexible approach outlined above to improve our estimation of the DFE in specific functional contexts of the human genome (especially regions in which we have very few variants) to provide a finer view into the evolution of complex traits.