Authors:
Blaise Hanczar
1
and
Avner Bar-Hen
2
Affiliations:
1
IBGBI University of Evry, France
;
2
University Paris Descartes, France
Keyword(s):
Supervised Learning, Reject Option, Cascade Classifier, Genomics Data, Personalized Medecine.
Related
Ontology
Subjects/Areas/Topics:
Bioinformatics
;
Biomedical Engineering
;
Data Mining and Machine Learning
;
Pattern Recognition, Clustering and Classification
Abstract:
The supervised learning in bioinformatics is a major tool to diagnose a disease, to identify the best therapeutic strategy or to establish a prognostic. The main objective in classifier construction is to maximize the accuracy in order to obtain a reliable prediction system. However, a second objective is to minimize the cost of the use of the classifier on new patients. Despite the control of the classification cost is high important in the medical domain, it has been very little studied. We point out that some patients are easy to predict, only a small subset of medical variables are needed to obtain a reliable prediction. The prediction of these patients can be cheaper than the others patient. Based on this idea, we propose a cascade approach that decreases the classification cost of the basic classifiers without dropping their accuracy. Our cascade system is a sequence of classifiers with rejects option of increasing cost. At each stage, a classifier receives all patients rejecte
d by the last classifier, makes a prediction of the patient and rejects to the next classifier the patients with low confidence prediction. The performances of our methods are evaluated on four real medical problems.
(More)