Dimension Reduction with Coevolutionary Genetic Algorithm for Text Classification

Tatiana Gasanova, Roman Sergienko, Eugene Semenkin, Wolfgang Minker


Text classification of large-size corpora is time-consuming for implementation of classification algorithms. For this reason, it is important to reduce dimension of text classification problems. We propose a method for dimension reduction based on hierarchical agglomerative clustering of terms and cluster weight optimization using cooperative coevolutionary genetic algorithm. The method was applied on 5 different corpora using several classification methods with different text preprocessing. The method reduces dimension of text classification problem significantly. Classification efficiency increases or decreases non-significantly after clustering with optimization of cluster weights.


