Author:
Julia Bondarenko
Affiliation:
Helmut Schmidt University Hamburg (University of the Federal Armed Forces Hamburg), Germany
Keyword(s):
Resampling, Classification algorithm C4.5, Uniform/(truncated) Normal distribution, Kurtosis, Chi-squared test, Kolmogorov-Smirnov test, Traffic injuries number.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence and Decision Support Systems
;
Enterprise Information Systems
;
Informatics in Control, Automation and Robotics
;
Intelligent Control Systems and Optimization
;
Knowledge-Based Systems Applications
;
Machine Learning in Control Applications
Abstract:
In imbalanced data sets, classes separated into majority (negative) and minority (positive) classes, are not approximately equally represented. That leads to impeding of accurate classification results. Well balanced data sets assume uniform distribution. The approach we present in the paper, is based on directed oversampling of minority class objects with simultaneous undersampling of majority class objects, to balance non-uniform data sets, and relies upon the certain statistical criteria. The resampling procedure is carried out for the daily traffic injuries data sets. The results obtained show the improving of rare cases (positive class objects) identification with accordance to several performance measures.