Explainable AI to Refine Associations between Prominent Genes and Alzheimer’s Disease Neuropathology

Aditi Chandrashekar

Under the mentorship of Professor Su-In Lee and Dr. Nicasia Beebe-Wang

This work was completed as the William H. and Helen Lang SURF Fellow at the Paul G. Allen School of Computer Science and Engineering, University of Washington.

Abstract

A critical focus of Alzheimer’s Disease (AD) research is the development of a generalized understanding of the genetic and molecular mechanisms involved in AD. The Multi-task Deep learning for Alzheimer’s Disease neuropathology (MD-AD) model, proposed by Beebe-Wang et al. (2021), learns interrelationships between brain gene expression data and neuropathological phenotypes related to AD in a multi-cohort setting [1]. Leveraging Explainable AI (XAI) gives insight into how the model makes these associations. We hypothesized that reducing the number of genes inputted into the model as features could result in clearer explanations of the model. Here we compare several methods to reduce the number of input genes to the model and discuss their tradeoffs. We show that using post hoc XAI-based gene selection methods to reduce the size of the feature space results in comparable performance to the original, more comprehensive MD-AD model, suggesting that, the genes most strongly associated with the presentation of neuropathological phenotypes are fairly representative of the full set of features in predicting AD neuropathology.

Introduction

Alzheimer’s Disease is a neurodegenerative disease with no known treatment to prevent, delay, or end its progression. A significant obstacle to the development of treatment is the limited knowledge of the molecular mechanisms behind AD. Several studies have explored correlations between molecular data and neuropathological phenotypes of AD. The Accelerating Medicines Partnership Alzheimer’s Disease (AMP-AD) consortium has assembled several postmortem brain RNA-sequencing datasets which span several neuropathological phenotypes.

There are several challenges to fitting a traditional deep learning model to this problem including a lack of samples and a limited knowledge of correlations between the corresponding phenotype labels. Traditional models are capable only of learning gene-phenotype interactions independently of one another, failing to capture nonlinear relationships in the data. The Multi-task Deep learning for Alzheimer’s Disease neuropathology (MD-AD) model, learns deep interrelationships between gene expression data and neuropathological measures of AD (Beebe-Wang et al., 2021). MD-AD is able to perform better than other deep learning models (such as Linear and Multi-Layer Perceptron (MLP) models) at the task of phenotype prediction because of its ability to learn attributes shared between the phenotypes as well as attributes independent to each phenotype.

Explainable AI (XAI) allows for better interpretation of deep learning models. Recent research in XAI has focused on several feature attribution methods to assess the relative importance of features as they relate to a model’s prediction. The method of Integrated Gradients (IG) in particular assesses the relative importance of each gene by summing the weighted gradients along the straight line path from a baseline to a sample.

The use of feature attribution methods is very relevant to the study of the molecular drivers of AD because such methods can be used to identify high-importance features, or genes, from the original dataset. By selecting these high-importance features, we might simplify and potentially clarify the problem by reducing dimensionality. Here we explore several different post hoc methods of feature selection and examine resulting model performance on the MD-AD model as well as Linear and MLP baselines. We compare the method of Integrated Gradients (IG) with the methods of ranking by pairwise linear correlations and random selection at the task of feature selection.

Results

Feature selection was conducted with three different methods of selecting genes, five feature set sizes, and three different model architectures. Model performance was evaluated across five feature set sizes consisting of 100, 500, 1000, 5000, and 14591 genes. Performance was compared over varying model type and feature attribution method.

The feature selection method of IG performs better than Linear Correlations which also outperforms the random baseline across all feature set sizes. There is a clear increase in error below 500 features, suggesting that that approximately 500 features is sufficient to achieve similar model prediction performance.

Figure 1. ****Test set 1-R2CV for each feature selection method examined over a varying feature set size. 1-R2CV is a prediction error metric calculated as mean squared error divided by label variance for phenotype predictions across five test splits. Error bars represent the range of prediction error values from each test split.

The method of IG is able to rank features based on their relative importance to the prediction outcome. Importance scores range from 0 to 1, corresponding to strong negative importance and strong positive importance respectively. Models trained with features selected based on absolute IG Importance performed significantly better than models trained on genes selected by positive importance, negative importance, or a random basis. This is likely because negatively important genes affect the prediction outcome just as much as positively important genes, so selecting based on absolute importance encapsulates more useful information.