392W Poster - Quantitative Genetics
Wednesday June 08, 8:30 PM - 9:15 PM

A kinship-based approach to learn maximally heritable traits from high-dimensional quantitative assays


Authors:
Callan O’Connor 1,2; Seamus Mawe 1; Greg Keele 1; Daniel Gatti 1; Ron Korstanje 1; Gary Churchill 1; Laura Reinholdt 1; J. Matthew Mahoney 1,3

Affiliations:
1) The Jackson Laboratory, Bar Harbor, ME; 2) Graduate School of Biomedical Sciences, Tufts University, Boston, MA; 3) Department of Neurological Sciences, University of Vermont Larner College of Medicine, Burlington, VT

Keywords:
Theory & Method Development

Identifying heritable traits for analysis is one of the essential tasks of quantitative genetics, and it is becoming more challenging as biological assays now routinely produce hundreds or thousands of quantitative variables. For example, computer vision systems quantify thousands of morphological features from images. Often, cleaner, more biologically relevant variables can be summarized as composite traits from the measured variables. Recently, Mitteroecker, et al. (Genetics, 2016) showed that finding maximally heritable composite traits as a linear combination of measured variables is equivalent to canonical correlation analysis (CCA) between the genotype and phenotype data matrices. This theoretical result provides a systematic approach to synthesize traits that are maximally aligned with genetic variation, but it has several practical limitations. It is typical to have many more genetic variants and measured variables than individuals (“large-p, small-n” data), leading to costly computations on extremely large matrices in CCA. Furthermore, CCA becomes unstable in high-dimensional settings. To overcome these limitations, we have applied two machine learning strategies to CCA to enable robust maximum heritability analysis in high dimensional data. First, we used kernel CCA (kCCA) instead of classical CCA to reduce the dimensionality of all data matrices. Second, we employed bootstrap aggregation (bagging) to minimize the variance of trait estimation. The inputs to kCCA are a genetic similarity matrix (i.e., a kinship matrix), which can include variant, pedigree, and non-additive effects, and a trait similarity matrix, which can be a covariance matrix or any non-linear positive definite kernel defined on the traits. To test our approach, we applied bagging kCCA to high-content screening imaging data of fibroblasts cultured from 200 Diversity Outbred (DO) mice. The raw image features included morphological quantifications of individual cells (e.g., nuclear and cell roundness) and summary measures of whole culture wells (e.g., cell densities) quantified using the Harmony 4.9 software suite for the Operetta high-content screening system. The maximally heritable trait corresponded to distinct differences in cell morphology (h2=0.67), suggesting that the genetic variation of DO fibroblasts strongly influenced how the cells organized in culture. We identified a significant quantitative trait locus (LOD = 8.7) on chromosome 12 for the composite trait that included the gene Pxdn, which is highly expressed in fibroblasts, has a known expression QTL in DO mice with consistent genetic effects to the trait QTL, and is involved in extracellular matrix organization. These results support bagging kCCA as a robust strategy for identifying maximally heritable traits from high-dimensional quantitative assays that can then be used for downstream genetic analyses.