Authors:
Houda Benhar
1
;
Ali Idri
2
;
1
and
Mohamed Hosni
1
Affiliations:
1
Software Project Management Research Team, ENSIAS, Mohammed V University, Rabat, Morocco
;
2
Complex Systems Engineering and Human Systems, Mohammed VI Polytechnic University, Ben Guerir, Morocco
Keyword(s):
Data Preprocessing, Feature Selection, Univariate, Filter Technique, Heart Disease, Classification.
Abstract:
In the last decade, feature selection (FS), was one of the most investigated preprocessing tasks for heart disease prediction. Determining the optimal features which contribute more towards the diagnosis of heart disease can reduce the number of clinical tests needed to be taken by a patient, decrease the model cost, reduce the storage requirements and improve the comprehensibility of the induced model. In this study a comparison of three filter feature ranking methods was carried out. Feature ranking methods need to set a threshold (i.e. the percentage of the number of relevant features to be selected) in order to select the final subset of features. Thus, the aim of this study is to investigate if there is a threshold value which is an optimal choice for three different feature ranking methods and four classifiers used for heart disease classification in four heart disease datasets. The used feature ranking methods and selection thresholds resulted in optimal classification perform
ance for one or more classifiers over small and large heart disease datasets. The size of the dataset takes an important role in the choice of the selection threshold.
(More)