9 Oral - Platform Session 1 Complex Traits
Wednesday June 08, 10:00 AM - 10:15 AM

Improving Phenotype Prediction by Learning Patterns of Sharing across Multiple Phenotypes


Authors:
Fabio Morgante 1,2; Gao Wang 3,4; Yuxin Zou 5; Abhishek Sarkar 4; Peter Carbonetto 4; Yang Li 2,4; Matthew Stephens 4,5

Affiliations:
1) Center for Human Genetics, Department of Genetics and Biochemistry, Clemson University, Greenwood, SC; 2) Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL; 3) Department of Neurology, Columbia University, New York City, NY; 4) Department of Human Genetics, University of Chicago, Chicago, IL; 5) Department of Statistics, University of Chicago, Chicago, IL

Keywords:
Theory & Method Development

Predicting phenotypes from genotypes is a fundamental task in quantitative genetics. While phenotype prediction was pioneered in agricultural breeding for selection purpose, it has recently become important in human genetics as well. In fact, predicting disease risk and other medically relevant phenotypes via Polygenic Risk Scores (PRS) is one of the main goals of precision medicine.

With technological advances, it is now possible to measure multiple phenotypes in large samples. Multiple phenotypes can share their genetic component; therefore, modeling these phenotypes jointly may improve prediction accuracy by leveraging the shared effects across such phenotypes. However, effects can be shared across phenotypes in a variety of ways, so computationally efficient statistical methods are needed that can accurately and flexibly capture patterns of effect sharing.

Here, we describe new Bayesian multivariate regression methods that, by using flexible priors, are able to model many different patterns of effect sharing and specificity across phenotypes. We evaluated our methods in their ability to predict gene expression in multiple tissues using simulations with different patterns of effect sharing across tissues and genomic heritabilities. The results suggest that these new methods can predict better than existing univariate and multivariate methods, while also being computationally efficient. We then sought to replicate those results on real data by analyzing the Genotype Tissue Expression (GTEx) project data. We showed that our methods improve prediction performance on average for all the tissues, especially for those groups of tissues where shared effects have been previously described.

While gene expression prediction was used as an application, our methods are general multivariate regression methods that can be used for any multi-phenotype applications, including PRS computation and breeding value prediction. Thus, our methods have the potential to provide improvements across fields and organisms.