Authors:
            
                    Sašo Karakatič
                    
                        
                    
                    ; 
                
                    Marjan Heričko
                    
                        
                    
                     and
                
                    Vili Podgorelec
                    
                        
                    
                    
                
        
        
            Affiliation:
            
                    
                        
                    
                    UM FERI, Slovenia
                
        
        
        
        
        
             Keyword(s):
            Classification, Genetic Algorithm, Instance selection, Weighting, Bagging.
        
        
            
                Related
                    Ontology
                    Subjects/Areas/Topics:
                
                        Artificial Intelligence
                    ; 
                        Computational Intelligence
                    ; 
                        Evolutionary Computing
                    ; 
                        Genetic Algorithms
                    ; 
                        Informatics in Control, Automation and Robotics
                    ; 
                        Intelligent Control Systems and Optimization
                    ; 
                        Knowledge Discovery and Information Retrieval
                    ; 
                        Knowledge-Based Systems
                    ; 
                        Machine Learning
                    ; 
                        Soft Computing
                    ; 
                        Symbolic Systems
                    
            
        
        
            
                Abstract: 
                An imbalanced or inappropriate dataset can have a negative influence in classification model training. In
this paper we present an evolutionary method that effectively weights or samples the tuples from the training
dataset and tries to minimize the negative effects from innaprotirate datasets. The genetic algorithm with
genotype of real numbers is used to evolve the weights or occurrence number for each learning tuple in the
dataset. This technique is used with individual classifiers and in combination with the ensemble technique of
bagging, where multiple classification models work together in a classification process. We present two variations
– weighting the tuples and sampling the classification tuples. Both variations are experimentally tested
in combination with individual classifiers (C4.5 and Naive Bayes methods) and in combination with bagging
ensemble. Results show that both variations are promising techniques, as they produced better classification
models than methods wit
                hout weighting or sampling, which is also supported with statistical analysis.
                (More)