294T Poster - Population Genetics
Thursday June 09, 8:30 PM - 9:15 PM

Location, location, location: Dissecting errors in machine learning prediction of geography


Authors:
Clara Rehmann; CJ Battey; Peter Ralph; Andrew Kern

Affiliation: Institute of Ecology and Evolution, University of Oregon, Eugene, OR

Keywords:
Theory & Method Development

The geographic history of a population is encoded within its genetic variation. Recently, we introduced Locator, a deep learning-based method for individual-level prediction of geographic location based on genotypic variation (Battey et al. 2020), that we demonstrated to be efficient and accurate using both simulated and empirical datasets. Further, when applied to empirical datasets, Locator’s residuals appear to reflect known patterns of geographic ancestry: in human populations, prediction errors correspond to known instances of migration, and in Anopheles and Plasmodium populations, predictions are potentially biased towards corridors of gene flow.
In order to assess how ancestral migration patterns are reflected in prediction errors, we explore the use of Locator to predict the locations of individuals in simulated populations undergoing increasing degrees of anisotropic dispersal. Our results confirm that residuals from Locator predictions align along the axes of biased dispersal, and we demonstrate a relationship between error magnitude, dispersal distance, and degree of dispersal bias. Additionally, we investigate the potential for spatial imbalance in the training set to bias predictions towards densely-sampled areas of the landscape and offer solutions to reduce this overfitting for empirical applications. Finally, we show that the magnitude and direction of errors in geographic prediction are strongly correlated between Anopheles and Plasmodium datasets, suggesting that we are capturing a coupled migration pattern of host and parasite.