Authors:
Yousra Cherif
1
;
Ali Idri
1
;
2
and
Omar El Alaoui
1
Affiliations:
1
Software Project Management Research Team, ENSIAS, Mohammed V University in Rabat, Morocco
;
2
Mohammed VI Polytechnic University Benguerir, Morocco
Keyword(s):
Species Distribution Models, Redstart Bird, Feature Selection, Univariate Filters, Environmental Data, Machine Learning, Classification.
Abstract:
Researchers rely on species distribution models (SDMs) to establish a correlation between species occurrence records and environmental data. These models offer insights into the ecological and evolutionary aspects of the subject. Feature selection (FS) aims to choose useful interlinked features or remove those that are unnecessary and redundant, reduce model costs, storage needs, and make the induced model easier to understand. Therefore, to predict the distribution of three bird species, this study compares five filter-based univariate feature selection methods to select relevant features for classification tasks using five thresholds, as well as four classifiers; Support Vector Machine (SVM), Light gradient-boosting machine (LGBM), Decision Tree (DT), and Random Forest (RF). The empirical evaluations involve several techniques, such as the 5-fold cross-validation method, the Scott Knott (SK) test, and Borda Count. In addition, we used three performance criteria (accuracy, kappa and
F1-score). Experiments showed that 40% and 50% thresholds were the best choice for classifiers, with RF outperforming LGBM, DT and SVM. Finally, the best combination for each classifier is as follows: RF and LGBM classifiers using Mutual information with 40% threshold, DT using ReliefF with 50% thresholds, and SVM using Anova F-value with 40% thresholds.
(More)