330V Poster Online - Virtual Posters
Tuesday June 07, 11:00 AM - 3:00 PM

Deriving Biological Insight from Genome Scans: a Tissue Enrichment Method for Noisy Gene Lists


Authors:
Lauren Sugden 1; Arthur Sugden 1,2

Affiliations:
1) Duquesne University; 2) Behaivior

Keywords:
Theory & Method Development

Many genomic studies such as selection scans and GWA studies result in lists of genes (hereafter defined as “genes of interest”) determined to contain some statistical signature signifying that these genes are undergoing selection, or are important for understanding a particular disease or phenotype. Often, some (but not all) of these genes may be involved in a common process or pathway that could provide some biological insight, but making these kinds of inferences robustly remains a significant challenge.

A common approach to addressing this compares the genes of interest to various curated gene sets, looking for an overall signature of “enrichment” where genes of interest are overrepresented compared to what one would expect by chance. One consequence of this approach is that a true signal involving a subset of genes of interest can be swamped by genes of interest that are unrelated to the signal. In addition, users can quickly run into multiple testing problems, potentially resulting in a high false discovery rate.

Here, we introduce a method that detects enrichment of tissue-specific genes using gene expression data from GTEx. We first build a bipartite graph of genes and tissues with weighted edges carefully calibrated to account for the vast range of gene specialization vs generality across the genome, as well as the divergent gene-expression profiles across tissues in the database. We then generate a “tissue score profile” for genes of interest as genes are dropped one by one from the list, allowing us to be sensitive to situations in which only a subset of genes of interest are driving a signal. This approach avoids multiple testing problems while still allowing us to observe multiple subsets of the data as we remove genes that introduce noise. Our method allows for genes of interest to have associated weights or probabilities, as might be generated from a genome-wide scan, and generates permutation-based empirical p-values for enrichment scores.