Authors:
Nicoletta Del Buono
1
;
Flavia Esposito
1
;
2
;
Laura Selicato
1
and
Maria Carmela Vegliante
2
Affiliations:
1
Members of INDAM-GNCS Research Group, Department of Mathematics, University of Bari Aldo Moro, via E. Orabona 4, I-70125, Bari Italy
;
2
Hematology and Cell Therapy Unit, IRCCS - Istituto Tumori Giovanni Paolo II, Bari, Italy
Keyword(s):
Outlier Detection, Gene Expression Profiling, Clustering, Robust PCA.
Abstract:
One of the main problems in analyzing real data is often related to the presence of anomalies. Anomalous cases may, in fact, spoil the resulting analysis as well as contain valuable information at the same time. In both cases, the ability to detect these occurrences is very important. Particularly, in biomedical field, a proper identification of outliers allows to develop novel biological hypotheses not taken into consideration when experimental biological data are considered. In this paper, we address the problem of detecting outlier samples in gene expression data. We propose an ensemble approach for anomalies detection in gene expression matrices based on the use of hierarchical clustering and Robust Principal Component Analysis, that allows to derive a novel pseudo mathematical classification of anomalies.