Authors:
Mlungisi Duma
1
;
Bhekisipho Twala
1
;
Tshilidzi Marwala
1
and
Fulufhelo V. Nelwamondo
2
Affiliations:
1
University of Johannesburg APK, South Africa
;
2
Council for Scientific and Industrial Research (CSIR), South Africa
Keyword(s):
Ripper, Principal component analysis, Automatic relevance determination, Artificial neural network, Missing data.
Related
Ontology
Subjects/Areas/Topics:
Hybrid Learning Systems
;
Informatics in Control, Automation and Robotics
;
Intelligent Control Systems and Optimization
;
Machine Learning in Control Applications
;
Neural Networks Based Control Systems
Abstract:
The Ripper algorithm is designed to generate rule sets for large datasets with many features. However, it was shown that the algorithm struggles with classification performance in the presence of missing data. The algorithm struggles to classify instances when the quality of the data deteriorates as a result of increasing missing data. In this paper, feature selection technique is used to help improve the classification performance of the Ripper algorithm. Principal component analysis and evidence automatic relevance determination techniques are chosen to improve the performance of the Ripper. A comparison is done to see which technique helps the algorithm improve the most. Training datasets with completely observable data were used to construct the algorithm, and testing datasets with missing values were used for measuring accuracy. The results showed that principal component analysis is a better feature selection for the Ripper. The results show that with principal component analys
is, the classification performance improves significantly as well as increase in resilience in the presence of escalating missing data.
(More)