299T Poster - Population Genetics
Thursday June 09, 9:15 PM - 10:00 PM

Inferring demographic history from allele frequency spectra with multi-layer perceptron regressors


Authors:
Linh Tran; Connie Sun; Mathews Sajan; Ryan Gutenkunst

Affiliation: University of Arizona, Tucson, AZ

Keywords:
Theory & Method Development

Previously, our group had developed dadi, a software for inferring demographic history using the diffusion approximation and composite likelihood. Inferring demography with dadi requires considerable understanding of the software and can be computationally expensive. In this work, we aim to improve the ease of use and lower the computational burden for dadi users with supervised machine learning. For each dadi-supported demographic model, we use dadi to simulate the expected allele frequency spectrum (AFS) under different demographic parameter values and train the scikit-learn Multi-layer Perceptron Regressor (MLPR) algorithm to infer these parameters from input AFS. We demonstrate that the trained MLPRs can infer the population-size-change parameters very well (ρ≈0.98) and other parameters such as migration rate and time of demographic event fairly well (ρ≈0.6-0.7). The trained MLPRs also make good predictions when tested on AFS generated by the msprime simulator, which includes linkage in its simulations. Importantly, our trained MLPRs provide parameter predictions instantaneously from input AFS, with accuracy comparable to parameters inferred by dadi’s likelihood optimization while bypassing its long and computationally intensive evaluation process. We also implement an accompanying method for quantifying the uncertainty of the point estimates output by the trained regressors, using a scikit-learn-compatible package, MAPIE (Model Agnostic Prediction Interval Estimator). We show that this method provides much better coverage for all demographic parameters tested compared to traditional bootstrapping.