296W Poster - Population Genetics
Wednesday June 08, 8:30 PM - 9:15 PM

Fast Multinomial Clustering of multiallelic genotypes to infer genetic population structure


Author:
Arun Sethuraman

Affiliation: San Diego State University

Keywords:
Theory & Method Development

Identifying population structure from multilocus genotype data is key to downstream population genetic analyses in a variety of fields, including conservation, evolutionary genetics, Genome Wide Association Studies (GWAS), and pedigree reconstruction for quantitative genetics. Several methods have been proposed to estimate population structure, but issues with speed of computation, reproducibility, and accuracy of estimation remain, particularly with Bayesian MCMC based methods to perform inference on the `admixture' model. Here I develop a likelihood based approach to infer population structure under the admixture model while handling polyploid, multi-allelic (e.g. SNP, STR, allozyme) loci to infer genetic admixture proportions and ancestral allele frequencies. I present three separate algorithms to perform inference - (1) Expectation Maximization, (2) Block Relaxation, and (3) Quasi Newton and SQUAREM acceleration which are implemented into the MULTICLUST framework. Comparative analyses with both simulated and empirical data with MULTICLUST and STRUCTURE indicate considerable improvements in time of computation for comparable inference, fast, reproducible, and accurately estimated parameters.