Document Clustering based on Genetic Algorithm using D-Individual

Lim Choen Choi, Soon Cheol Park

Abstract

Document clustering using genetic algorithm shows good performance. However the genetic algorithm has problem of performance degradation by premature convergence phenomenon. In this paper, we proposed the document clustering based on Genetic Algorithm using D-Individual (DIGA) to solve this problem. Genetic algorithm is based on the diversity of population and the capability to convergence. Success of genetic algorithm depends on these two factors. If we use these factors efficiently, we can get a better solution in reduced execution time. We apply DIGA to Reuter-21578 text collection and demonstrate the effect of our clustering algorithm. The results show that our DIGA has better performance than traditional clustering algorithms (K-means, Group Average and genetic algorithm) in various experiments.

References

  1. B. Y. Ricardo and R. N. Berthier.: Modern information retrieval, Addison Wesley, 1999.
  2. Christopher D. Manning, Prabhakar Raghavan & Hinrich Schütze, Introduction to Information Retrieval, 2008.
  3. S. Selim and M. Ismail, "K-means-type algorithm generalized convergence theorem and characterization of local optimality", IEEE Trans. Pattern Anal. Mach Intell. vol. 6, pp. 81- 87, 1984.
  4. W. Song, S. C. Park, Genetic algorithm-based text clustering technique, LNCS 4221 (2006) 779_782.
  5. W. Song, S. C. Park, "Genetic algorithm for text clustering based on latent semantic indexing", Computers and Mathematics with Applications, vol. 57, pp. 1901-1907, 2009
  6. U. Maulik, S. Bandyopadhyay, "Genetic algorithm- based clustering technique", Patten Recognition. vol. 33, pp. 1455-1465, 2000.
  7. J. Andre, P. Siarry, T. Dognon An improvement of the standard genetic algorithm fighting premature convergence in continuous optimization, Advances in Engineering Software 32 49-60, 2001.
  8. L. D. Davis, "Handbook of Genetic Algorithms", Van Nostrand Reinhold, 1991.
  9. Xin Yao, Yong liu and Guangming Lin: Evolutionary Programming Made Faster. IEEETrans, Evolutionary Computation, Vol. 3, No. 2 (1999).
  10. D. L. Davies, D. W. Bouldin, A cluster separation measure, IEEE Trans. Pattern Anal. Intell. 1 (1979) 224_227.
  11. Csaba Legany, Sandor Juhasz, Attila Babos, “Cluster validity measurement techniques”, “Knowledge Engineering and Data Bases”, Vol 5, pp. 388-393, 2006.
Download


Paper Citation


in Harvard Style

Choi L. and Park S. (2011). Document Clustering based on Genetic Algorithm using D-Individual . In Proceedings of the International Workshop on Semantic Interoperability - Volume 1: IWSI, (ICAART 2011) ISBN 978-989-8425-43-0, pages 111-118. DOI: 10.5220/0003351601110118


in Bibtex Style

@conference{iwsi11,
author={Lim Choen Choi and Soon Cheol Park},
title={Document Clustering based on Genetic Algorithm using D-Individual },
booktitle={Proceedings of the International Workshop on Semantic Interoperability - Volume 1: IWSI, (ICAART 2011)},
year={2011},
pages={111-118},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003351601110118},
isbn={978-989-8425-43-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Workshop on Semantic Interoperability - Volume 1: IWSI, (ICAART 2011)
TI - Document Clustering based on Genetic Algorithm using D-Individual
SN - 978-989-8425-43-0
AU - Choi L.
AU - Park S.
PY - 2011
SP - 111
EP - 118
DO - 10.5220/0003351601110118