276W Poster - Population Genetics
Wednesday June 08, 8:30 PM - 9:15 PM

Efficient analysis of allele frequency variation from whole-genome pool-sequencing data


Authors:
Lucas Czech 1; Yunru Peng 1; Jeffrey Spence 2; Patricia Lang 3; Tatiana Bellagio 1,3; Julia Hildebrandt 4; Katrin Fritschi 4; Rebecca Schwab 4; Beth Rowan 4; Detlef Weigel 4; J.F. Scheepens 5; François Vasseur 3,6; Moises Exposito-Alonso 1,3,4,7; GrENE-net consortium

Affiliations:
1) Department of Plant Biology, Carnegie Institution for Science, Stanford, USA; 2) Department of Genetics, Stanford University, Stanford, USA; 3) Department of Biology, Stanford University, Stanford, USA; 4) Department of Molecular Biology, Max Planck Institute for Biology Tübingen, Tübingen, Germany; 5) Faculty of Biological Sciences, Goethe University, Frankfurt, Germany; 6) Centre d'Écologie Fonctionnelle et Évolutive (CEFE), University of Montpellier, Montpellier, France; 7) Department of Global Ecology, Carnegie Institution for Science, Stanford, USA

Keywords:
Theory & Method Development

In recent decades, so-called Evolve-and-Resequence (E&R) experiments have become a popular approach to survey rapid evolution of populations over multiple generations. These experiments allow us to measure shifts in the allele frequencies of a population in response to new or shifting environmental conditions, such as a changing climate.

Pool-sequencing of several individuals at once is a cost-effective and efficient tool to obtain reliable allele frequencies from a population of thousands to hundreds of thousands of individuals, and is often used in E&R experiments. However, specialized tools to efficiently analyze these data that take sampling biases stemming from the pool-sequencing approach into account were lacking. We developed two software tools to overcome statistical and bioinformatic challenges arising in this context.

First, we present grenepipe, a workflow from raw sequencing data of individuals or pooled populations to genotypes (variant calling) and population allele frequencies. The pipeline automates trimming, mapping, variant calling, and quality control, with a selection of popular software tools in each of these steps, and produces variant calls and frequency tables. While generally applicable to individual sample data, it offers specialized steps for pool-sequencing. With a single command line call, our software downloads all dependencies and runs all steps automatically, parallelizes processing for computer cluster environments, and recovers from any failing steps.

Second, to enable inferences of evolutionary signatures from frequency data, we created grenedalf, a C++ command line tool to compute population genetic statistics. It computes unbiased statistics of Fst, Pi, Tajima’s D with pool-sequencing data, far outperforming alternative tools. Further it offers novel data exploration tools such as windowed allele frequency spectrum visualizations and PCA and MDS on the allele frequencies, and built-in data filters and manipulations.

These tools are designed for scalability and ease-of-use with contemporary file formats, which we showcase using the GrENE-net.org project, a large-scale Evolve-and-Resequence experiment with Arabidopsis thaliana from across the world.