272W Poster - Population Genetics
Wednesday June 08, 8:30 PM - 9:15 PM

Expanding the Use of Generative Adversarial Networks in Population Genetics to Create Artificial Sequence Alignments


Authors:
William Booker; Dan Schrider

Affiliation: University of North Carolina at Chapel Hill, Chapel Hill, NC

Keywords:
Theory & Method Development

Research over the last decade has demonstrated that machine learning methods have significant utility in the field of population genetics. Among the numerous potential machine learning tools available for use, generative networks such as Generative Adversarial Networks (GANs) have recently been used in population genetics to generate artificial human genomes and estimate the demographic parameters of populations. The use of GANs in these contexts are largely a modification of the original GAN architecture initially developed to generate fake images such as human faces, but expansions on these networks demonstrate they can be incredibly powerful for a variety of tasks—many of which can be applied to current problems in population genetics and evolution. Here, we further explore the utility of GANs in population genetic research. Broadly, we make the first attempt to use Generative Adversarial Networks in which a neural network generates population genomic sequence alignments—yielding a set of genomic sequences sampled from the same population rather than a single sequence—that resemble those created from sequencing data or simulations. Using modifications from the original GAN architecture including Deep Convolutional and Wasserstein GANs, we demonstrate consistent training success across runs without the need for extensive hyperparameter tuning. We then demonstrate the ability of this network to generate alignments under several demographic scenarios that retain properties of the input alignments relevant to population genetics (e.g. the site frequency spectrum, linkage disequilibrium decay with physical distance, etc.). Overall, this work expands upon the applicability of GANs to population genetics and underlies a framework for significant expansion for the future.