Authors:
Judith Neugebauer
;
Oliver Kramer
and
Michael Sonnenschein
Affiliation:
Carl von Ossietzky University Oldenburg, Germany
Keyword(s):
Time Series Classification, High-dimensional Classification, Imbalanced Learning, Data Preprocessing.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Biomedical Engineering
;
Biomedical Signal Processing
;
Computational Intelligence
;
Data Manipulation
;
Data Mining
;
Databases and Information Systems Integration
;
Enterprise Information Systems
;
Evolutionary Computing
;
Health Engineering and Technology Applications
;
Human-Computer Interaction
;
Industrial Applications of AI
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Methodologies and Methods
;
Neurocomputing
;
Neurotechnology, Electronics and Informatics
;
Pattern Recognition
;
Physiological Computing Systems
;
Sensor Networks
;
Signal Processing
;
Soft Computing
;
Symbolic Systems
Abstract:
Beside the curse of dimensionality and imbalanced classes, unfavorable data distributions can hamper classification
accuracy. This is particularly problematic with increasing dimensionality of the classification task.
A classifier that can handle high-dimensional and imbalanced data sets is the cascade classification method
for time series. The cascade classifier can compound unfavorable data distributions by projecting the high-dimensional
data set onto low-dimensional subsets. A classifier is trained for each of the low-dimensional
data subsets and their predictions are aggregated to an overall result. For the cascade classifier, the errors of
each classifier accumulate in the overall result and therefore small improvements in each small classifier can
improve the classification accuracy. Therefore we propose two methods for data preprocessing to improve the
cascade classifier. The first method is instance selection, a technique to select representative examples for the
cl
assification task. Furthermore, artificial infeasible examples can improve classification performance. Even if
high-dimensional infeasible examples are available, their projection to low-dimensional space is not possible
due to projection errors. We propose a second data preprocessing method for generating artificial infeasible
examples in low-dimensional space. We show for micro Combined Heat and Power plant power production
time series and an artificial and complex data set that the proposed data preprocessing methods increase the
performance of the cascade classifier by increasing the selectivity of the learned decision boundaries.
(More)