59 Oral - Platform Session #6 Theory and Methods
Friday June 10, 10:50 AM - 11:05 AM

Interpretable machine learning improves performance in association, discovery, and prediction


Authors:
Mariano Alvarez 1; Emily Abernathy 1; Cynthia Rudin 2

Affiliations:
1) Avalo; 2) Duke University, Durham, NC

Keywords:
Theory & Method Development

The incorporation of machine learning (ML) techniques in genomics has enhanced a variety of routine tasks, from predicting gene function to genomic selection. However, tasks outside of prediction have received little attention, largely because of the black box nature of ML algorithms. The emerging literature on interpretable ML offers useful innovations that might allow the use of ML techniques for discovery and deeper understanding. We use the model-agnostic concept of conditional model reliance and show how simple, interpretable summary statistics can be generated from any black box prediction algorithm. We then test the performance of these statistics across several common tasks in quantitative genetics, including association mapping, genomic prediction, and tests for natural selection, and find that estimations of conditional model reliance provide substantially higher performance than existing methods. Specifically, using simulations and real data in Arabidopsis thaliana and Oryza sativa, we show that interpretable ML techniques can more accurately identify causal loci in association mapping and loci under selection when the selective environment is known. We also show that modeling these loci alone can meet or exceed prediction accuracy from models using whole-genome data. We suggest that interpretable ML techniques offer new opportunities to model and derive insight into difficult problems in evolutionary biology.