On potential limitations of differential expression analysis with non-linear machine learning models

Authors

  • Gianmarco Sabbatini aizoOn Technology Consulting, Strada del Lionetto, 6, 10146, Torino (TO)
  • Lorenzo Manganaro aizoOn Technology Consulting, Strada del Lionetto, 6, 10146, Torino (TO)

DOI:

https://doi.org/10.14806/ej.28.0.1035

Keywords:

differential expression, explainable artificial intelligence, machine learning, gene expression

Abstract

Recently, there has been a growing interest in bioinformatics toward the adoption of increasingly complex machine learning models for the analysis of next-generation sequencing data with the goal of disease subtyping (i.e., patient stratification based on molecular features) or risk-based classification for specific endpoints, such as survival. With gene-expression data, a common approach consists in characterising the emerging groups by exploiting a differential expression analysis, which selects relevant gene sets coupled with pathway enrichment analysis, providing an insight into the underlying biological processes. However, when non-linear machine learning models are involved, differential expression analysis could be limiting since patient groupings identified by the model could be based on a set of genes that are hidden to differential expression due to its linear nature, affecting subsequent biological characterisation and validation. The aim of this study is to provide a proof-of-concept example demonstrating such a limitation. Moreover, we suggest that this issue could be overcome by the adoption of the innovative paradigm of eXplainable Artificial Intelligence, which consists in building an additional explainer to get a trustworthy interpretation of the model outputs and building a reliable set of genes characterising each group, preserving also non-linear relations, to be used for downstream analysis and validation.

Downloads

Published

2023-03-08

Issue

Section

Research Papers