Authors:
Adara S. R. Nogueira
1
;
Artur J. Ferreira
2
;
1
and
Mário A. T. Figueiredo
2
;
3
Affiliations:
1
ISEL, Instituto Superior de Engenharia de Lisboa, Instituto Politécnico de Lisboa, Portugal
;
2
Instituto de Telecomunicações, Lisboa, Portugal
;
3
IST, Instituto Superior Técnico, Universidade de Lisboa, Portugal
Keyword(s):
Machine Learning, Feature Selection, Feature Discretization, Microarray Data, Cancer Explainability.
Abstract:
Detecting diseases, such as cancer, from from gene expression data has assumed great importance and is a very active area of research. Today, many gene expression datasets are publicly available, which consist of microarray data with information on the activation (or not) of thousands of genes,
in sets of patients that have (or not) a certain disease. These datasets consist of high-dimensional feature vectors (very large numbers of genes), which raises difficulties for human analysis and interpretation with the goal of identifying the most relevant genes for detecting the presence of a particular disease. In this paper, we propose to take a step towards the explainability of these disease detection methods, by applying feature discretization and feature selection techniques. We accurately classify microarray data, while substantially reducing and identifying subsets of relevant genes. These small subsets of genes are thus easier to interpret by human experts, thus potentially pro
viding valuable information about which genes are involved in a given disease.
(More)