Authors:
Christina Brester
1
;
Jussi Kauhanen
2
;
Tomi-Pekka Tuomainen
2
;
Eugene Semenkin
3
and
Mikko Kolehmainen
2
Affiliations:
1
University of Eastern Finland and Siberian State Aerospace University, Finland
;
2
University of Eastern Finland, Finland
;
3
Siberian State Aerospace University, Russian Federation
Keyword(s):
Feature Selection, Two-Criterion Filtering, Cooperative Multi-Objective Genetic Algorithm, Cardiovascular Modelling.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Computational Intelligence
;
Evolutionary Computation and Control
;
Evolutionary Computing
;
Genetic Algorithms
;
Informatics in Control, Automation and Robotics
;
Intelligent Control Systems and Optimization
;
Soft Computing
Abstract:
In this paper we compare a number of two-criterion filtering techniques for feature selection in cardiovascular predictive modelling. We design two-objective schemes based on different combinations of four criteria describing the quality of reduced feature sets. To find attribute subsystems meeting the introduced criteria in an optimal way, we suggest applying a cooperative multi-objective genetic algorithm. It includes various search strategies working in a parallel way, which allows additional experiments to be avoided when choosing the most effective heuristic for the problem considered. The performance of filtering techniques was investigated in combination with the SVM model on a population-based epidemiological database called KIHD (Kuopio Ischemic Heart Disease Risk Factor Study). The dataset consists of a large number of variables on various characteristics of the study participants. These baseline measures were collected at the beginning of the study. In addition, all major
cardiovascular events that had occurred among the participants over an average of 27 years of follow-up were collected from the national health registries. As a result, we found that the usage of the filtering technique including intra- and inter-class distances led to a significant reduction of the feature set (up to 11 times, from 433 to 38 features) without detriment to the predictive ability of the SVM model. This implies that there is a possibility to cut down on the clinical tests needed to collect the data, which is relevant to the prediction of cardiovascular diseases.
(More)