338T Poster - Quantitative Genetics
Thursday June 09, 8:30 PM - 9:15 PM

Genetic dissection of the pluripotent proteome through multi-omics data integration


Authors:
Selcan Aydin 1; Tian Zhang 2; Duy Pham 1; Daniel A. Skelly 1; Matthew Pankratz 3; Devin K. Porter 3; Greg Keele 1; Ted Choi 3; Steven Gygi 2; Laura G. Reinholdt 1; Christopher L. Baker 1; Gary A. Churchill 1; Steven C. Munger 1

Affiliations:
1) The Jackson Laboratory; 2) Harvard Medical School; 3) Predictive Biology

Keywords:
Complex traits

The phenotypic variability observed across pluripotent stem cell (PSC) lines currently limits their use in personalized medicine. Genetic background is a major driver of this variability, and studies addressing it have relied on transcript abundance as the primary measure for gene expression. However, little is known about how proteins, the functional units in the cell, vary across PSCs and how this relates to variation in other measures. Here we present the first comprehensive genetic study characterizing the pluripotent proteome using 190 unique mouse embryonic stem cell lines derived from highly heterogeneous Diversity Outbred mice. Genome-wide comparisons of protein abundance to chromatin accessibility and transcript abundance showed high levels of co-variation. This co-variation was evident in the quantitative trait loci (QTL) results, as 39% of the total 1,676 significant protein abundance QTL (pQTL) co-mapped with chromatin accessibility and expression QTL (eQTL). Most of these shared QTL mapped proximal to the gene itself and likely reflect cis-regulatory polymorphisms. In contrast, 34% of pQTL were unique to protein abundance, and most of these loci were distal. To distinguish shared and unique drivers of variability across molecular layers, we integrated the genomic data sets using multi-omics factor analysis. Integration resulted in 22 latent factors that cumulatively explain 28%, 39%, and 35% of the variation in chromatin accessibility, transcript abundance, and protein abundance, respectively. Functional characterization showed that these factors capture variation relevant to pluripotency maintenance. For example, Factor 3 was highly correlated to the genotype of cell lines at the Lifr locus on Chr 15, which we previously identified as a major eQTL hotspot influencing pluripotency maintenance. As expected, we showed that Factor 3 maps with a significant QTL to the same locus. Remarkably, we found many proteins and transcripts that, although they strongly contribute to Factor 3 and exhibit similar allele-level effects on abundance at the Lifr locus, each lack a significant Chr 15 distal QTL. These findings highlight the power of multi-omics data integration in revealing the distal impacts of genetic variation. While QTL mapping with individual traits may be limited due to noise introduced by measurement error, data integration can act to consolidate the influence of genetic signals shared across molecular traits and increase detection power.