On potential limitations of differential expression analysis with non-linear machine learning models

Gianmarco Sabbatini, Lorenzo Manganaro


Recently, there has been a growing interest in bioinformatics toward the adoption of increasingly complex machine learning models for the analysis of next-generation sequencing data with the goal of disease subtyping (i.e., patient stratification based on molecular features) or risk-based classification for specific endpoints, such as survival. With gene-expression data, a common approach consists in characterising the emerging groups by exploiting a differential expression analysis, which selects relevant gene sets coupled with pathway enrichment analysis, providing an insight into the underlying biological processes. However, when non-linear machine learning models are involved, differential expression analysis could be limiting since patient groupings identified by the model could be based on a set of genes that are hidden to differential expression due to its linear nature, affecting subsequent biological characterisation and validation. The aim of this study is to provide a proof-of-concept example demonstrating such a limitation. Moreover, we suggest that this issue could be overcome by the adoption of the innovative paradigm of eXplainable Artificial Intelligence, which consists in building an additional explainer to get a trustworthy interpretation of the model outputs and building a reliable set of genes characterising each group, preserving also non-linear relations, to be used for downstream analysis and validation.


differential expression; explainable artificial intelligence; machine learning; gene expression

Full Text:


DOI: https://doi.org/10.14806/ej.28.0.1035


  • There are currently no refbacks.