Document Clustering using Multi-objective Genetic Algorithm with Different Feature Selection Methods

Jung Song Lee, Lim Cheon Choi, Soon Cheol Park

Abstract

Multi-objective genetic algorithm for the document clustering is proposed in this paper. The researches of the document clustering using k-means and genetic algorithm are much in progress. k-means is easy to be implemented but its performance much depends on the first stage centroid values. Genetic algorithm may improve the clustering performance but it has the disadvantage to trap in the local minimum value easily. However, Multi-objective genetic algorithm is stable for the performances and avoids the disadvantage of genetic algorithms in our experiments. The several feature selection methods are applied to and compared with those clustering algorithms. Consequently, Multi-objective genetic algorithms showed about 20% higher performance than others.

References

  1. H. Frigui, R. Krishnapuran.: A robust compettive clustering algorithm with application on computer vision. IEEE Trans. Pattern Anal. Mach. Intell. 21 (1999) 450-465.
  2. W. B. Croft, D. Metzler and T. Strohman.: Search Engines Information Retrieval in Practice. Addison Wesley. (2009).
  3. S. Selim and M. Ismail.: k-means-type algorithm generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach Intell. Vol. 6 (1984) 81-87.
  4. J. B. MacQueen.: Some Methods for clssification and Analysis of Multivariage Observation. Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley. University of California Press. (1967) 281-297.
  5. W. Song and S. C. Park.: Genetic algorithm for text clustering based on latent semantic indexing. Computer s and Mathematics with Applications. Vol. 57 (2009) 1901-1907.
  6. A. Osyczka.: Multicriteria optimization for engineering design. Design Optimization (J.S.Gero, ed.). (2985) 193-227.
  7. S. K. Park, S. B. Lee, W. C. Lee.: Goal-Pareto based NSGA-? Algorithm for Multiobjective Optimization. Conference Korea Information and Communications. Vol. 32 No. 11 (2007) 1079-1085.
  8. Jared L. Cohon and David H. Marks.: A Review and Evaluation of Multiobjective Programming Techniques. Water Resources Research. Vol. 11. No. 2 (1975) 208-220
  9. Censor, Y.: Pareto Optimality in Multiobjective Problems. Appl. Math. Optimiz. Vol. 4. (1977) 41-59.
  10. Holland J. H.: Adaption in natural and artificial systems. Ann Arbor: Univ. Michigan Press. (1975).
  11. Goldberg D. E.: Genetic algorithm in search, Optimization and Machine Learning. Addison-Wesley. New York (1989).
  12. L. D. Davis.: Handbook of Genetic Algorithms. Van Nostrand Reinhold (1991).
  13. U. Maulik, S. Bandyopadhyay.: Genetic algorithm based clustering technique. Patten Recognition. Vol. 33 (2000) 1455-1465.
  14. K. Deb.: Multi-Objective using Evolutionary Algorithms. John Wiley & Sons, Ltd. Chichester, England. (2001).
  15. K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan.: A Fast Elitist Multiobjective Genetic algorithm: NSGA- ? . IEEE Transaction on Evolutionary Computation. Vol. 6. No. 2. (2002) 182-197.
  16. http://mikilab.doshisha.ac.jp/dia/research/mop_ga/moga/3/3-5-5.html.
  17. Calinski T. and Harabasz, J.: A Dendrite Method for Cluster Analysis. Communucations in Statistics. Vol. 3. No. 1 (1974) 1-27.
  18. Davies D.L, and Bouldin, D.W.: A Cluster Separation measure. IEEE transactions on Pattern analysis and Machine Intelligene. Vol. PAMI 1. No. 2. (1979) 224-227.
  19. Y. Yang and J.O. Pedersen.: A comparative study on feature selection in text categorization. In Proc. ICML. (1997) 412-420.
  20. Tao Lie, Shengping Liu, Zheng Chen, Wei-Ying Ma.: An evaluation on Feature Selection for Text Clustering.
  21. J. Kogan, C. Nicholas, and V. Volkovich.: Text mining with information-theoretical clustering. Computing in Science and Engineering. (2003).
Download


Paper Citation


in Harvard Style

Lee J., Choi L. and Park S. (2011). Document Clustering using Multi-objective Genetic Algorithm with Different Feature Selection Methods . In Proceedings of the International Workshop on Semantic Interoperability - Volume 1: IWSI, (ICAART 2011) ISBN 978-989-8425-43-0, pages 101-110. DOI: 10.5220/0003351401010110


in Bibtex Style

@conference{iwsi11,
author={Jung Song Lee and Lim Cheon Choi and Soon Cheol Park},
title={Document Clustering using Multi-objective Genetic Algorithm with Different Feature Selection Methods},
booktitle={Proceedings of the International Workshop on Semantic Interoperability - Volume 1: IWSI, (ICAART 2011)},
year={2011},
pages={101-110},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003351401010110},
isbn={978-989-8425-43-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Workshop on Semantic Interoperability - Volume 1: IWSI, (ICAART 2011)
TI - Document Clustering using Multi-objective Genetic Algorithm with Different Feature Selection Methods
SN - 978-989-8425-43-0
AU - Lee J.
AU - Choi L.
AU - Park S.
PY - 2011
SP - 101
EP - 110
DO - 10.5220/0003351401010110