Analyzing fMRI data with deep learning models

Mar, 15 2020, 3-5min read


Thomas, A. W., Heekeren, H. R., Müller, K. R., & Samek, W. (2019). Analyzing neuroimaging data through recurrent deep learning models. Frontiers in Neuroscience, 13, 1321. doi.org/10.3389/fnins.2019.01321


Deep learning (DL) models have been shown to outperform conventional machine learning techniques in a variety of research fields and prediction tasks. Yet, their application to the decoding of functional Magnetic Resonance Imaging (fMRI) data has so far been limited. Mainly, because DL models are often viewed as "black boxes", disguising the relationship between input data and prediction due to their highly non-linear nature. It is, however, a central goal of cognitive neuroscience to identify (and learn about) the association between brain activity and the underlying cognitive state of an individual (e.g., while viewing images of different categories or performing a specific cognitive task).

The lack of understanding and interpretation of the decision process of deep learning models is a clear drawback and has recently also attracted attention in the field of machine learning (see for example, this overview). In this context, the so called Layer-wise Relevance Propagation (LRP) technique has been proposed as a general technique for explaining the predictions of predictive functions. Specifically, the LRP technique decomposes the predictions of a decision model into the contributions (or relevances) of the features of the input data to the predictions.

In the DeepLight framework, we utilize the LRP technique to interpret the decoding decisions of a deep learning model that is trained to identify (ie., decode) a set of cognitive states from whole-brain fMRI data, thereby, relating the brain activity to the decoded cognitive state:


The DL model that we used in this study consists of three distinct computational modules: a convolutional feature extractor, an LSTM, and an output unit. First, the DL model separates each fMRI volume into a sequence of axial brain slices. These slices are then processed by a two-dimensional convolutional feature extractor, resulting in a sequence of higher-level, and lower-dimensional, slice representations. These higher-level slice representations are then fed to an LSTM, integrating the spatial dependencies of the observed brain activity within and across axial brain slices. Lastly, the output unit makes a decoding decision, by projecting the output of the LSTM into a lower-dimensional space, spanning the cognitive states in the data. Here, a probability for each cognitive state is estimated, indicating whether the input fMRI volume belongs to each of these states. Subsequently, DeepLight relates the brain activity and cognitive state, by applying the LRP technique to its decoding decision. Thereby, decomposing the decision into the contributions of the single input voxels to each decision. Importantly, the DeepLight approach is not dependent on any specific architecture of the DL model. The DL model architecture described here is exemplary and derived from previous work.

Importantly, the LRP decomposition is performed on the level of a single fMRI volumes (and decoding decisions), which allows for an analysis on several levels of data granularity, from the level of the group down to the level of single subjects, trials and time points. DeepLight is thereby able to study the temporal and spatial distribution of brain activity across sequences of single fMRI volumes.

The following two videos exemplify this feature, by showing the distribution of relevance for each fMRI sample of a trial (25 s) in which participants either see the image of a face (video 1) or the video of a place (video 2) (for more details on this analysis, see Figure 6 of our paper):

In addition to the relevance distribution on the right, the videos show DeepLight's softmax prediction for each of the four target classes of the underlying experiment (top left panel; in the experiment, participants saw images of body parts, faces, places, and tools), DeepLight's average decoding accuracy (middle left panel), as well as a similarity measure (the F1-score; lower left panel), quantifying the similarity between the shown distribution of relevance and the distribution of brain activity that one would expect, given a meta-analysis with NeuroSynth for the terms "face" and "place" (higher F1-score values generally indicate more similarly).

Video 1 (face):

Video 2 (place):

Overall, we find that DeepLight accurately identifies the two target classes from the fMRI data, as indicated by an increasing decoding accuracy over the course of the trials. In addition, DeepLight utilises the brain activity from a set of biologically plausible brain regions to identify each of the two target stimuli (as shown by generally increasing F1-scores which describe the overlap between the relevance distributions shown on the right and the results of our meta analysis), thus indicating that DeepLight's brain maps capture a biologically plausible association between brain activity and cognitive states (here, the viewing of face and place stimuli).