Document Clustering Using Multi-Objective Genetic Algorithms with Parallel Programming Based on CUDA

Jung Song Lee, Soon Cheol Park, Jong Joo Lee, Han Heeh Ham

Abstract

In this paper, we propose a method of enhancing Multi-Objective Genetic Algorithms (MOGAs) for document clustering with parallel programming. The document clustering using MOGAs shows better performance than other clustering algorithms. However, the overall computation time of the MOGAs is considerably long as the number of documents increases. To effectively avoid this problem, we implement the MOGAs with General-Purpose computing on Graphics Processing Units (GPGPU) to compute the document similarities for the clustering. Furthermore, we introduce two thread architectures (Term-Threads and Document-Threads) in the CUDA (Compute Unified Device Architecture) language. The experimental results show that the parallel MOGAs with CUDA are tremendously faster than the general MOGAs.

References

  1. Croft, W. B., Metzler, D., and Strohman, T., 2009. The book, Search engines information retrieval in practice. Addison Wesley.
  2. Maulik, U., and Bandyopadhyay, S., 2000. Genetic algorithm-based clustering technique. Pattern Recognition, 33(9):1455-1465.
  3. Song, W., and Park, S.C., 2009. Genetic algorithm for text clustering based on latent semantic indexing. Computers and Mathematics with Applications. 57:1901-1907.
  4. Song W., and Park, S.C., 2010. Latent semantic analysis for vector space expansion and fuzzy logic-based genetic clustering. Knowledge and Information Systems. 22:347-369.
  5. Lee, J. S., Choi, L. C., and Park, S. C. 2011. Document clustering using multi-objective genetic algorithm with different feature selection methods. 1st International Workshop on Semantic Interoperability.
  6. Lee, J. S., Choi, L. C., and Park, S. C., 2011. Multiobjective genetic algorithms, NSGA-II and SPEA2, for document clustering. Communications in Computer and Information Science. 257:219-227.
  7. Lee, J., S. and Park, S., C. 2013. Generation of Nonredundant Summary based on Sentence Clustering Algorithms of NSGA-II and SPEA2. The 4th International Conference on Evolutionary Computation Theory and Applications. 176-182.
  8. Deb, K., Pratap, A., Agarwal, S., and Meyarivan, T., 2002. A fast elitist multiobjective genetic algorithm: NSGAII. IEEE Transaction on Evolutionary Computation. 6(2):182-197.
  9. Zitzler, E., Laumanns, M., and Thiele, L., 2002. SPEA2: Improving the strength pareto evolutionary algorithm. Proceedings of the EROGEN.
  10. Lee, J., S. and Park, S., C. 2012. Document Clustering Using Multi-Objective Genetic Algorithms on MATLAB Distributed Computing. The 3rd International Conference on Information Science and Applications.
  11. The NVIDIA Corporation. 2012. The book, NVIDIA C.U.D.A. Programming guide.
  12. Konak, A., Coit, D. W and Smith, A. E., 2006. Multiobjective optimization using genetic algorithms : A tutorial. Reliability Engineering and System Safety. 91:992-1007.
  13. Choi, L.C., Choi, K.U., and Park, S.C., 2008. An automatic semantic term-network construction system. In International Symposium on computer Science and its Applications.
  14. Salton, G., Buckley, C., 1988. Term-weighting approaches in automatic text retrieval. Information Processing & Management.
  15. Xia, H., Wang, S., and Yoshida, T., 2006. A modified antbased text clustering algorithm with semantic similarity measure. Journal of Systems Science & Systems Engineering. 15(4):474-492.
  16. Calinski, T., Harabasz, J., 1974. A dendrite method for cluster analysis. Communucations in Statistics.
  17. Davies, D.L., and Bouldin, D.W., 1979. A cluster separation measure. IEEE transactions on Pattern analysis and Machine Intelligene.
  18. Fragoudis, D., Meretakis, D., and Likothanassis, S., 2005. Best terms:an efficient feature-selection algorithm for text categorization. Knowledge and Information Systems.
Download


Paper Citation


in Harvard Style

Lee J., Park S., Lee J. and Ham H. (2014). Document Clustering Using Multi-Objective Genetic Algorithms with Parallel Programming Based on CUDA . In Proceedings of the 11th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO, ISBN 978-989-758-039-0, pages 280-287. DOI: 10.5220/0005057502800287


in Bibtex Style

@conference{icinco14,
author={Jung Song Lee and Soon Cheol Park and Jong Joo Lee and Han Heeh Ham},
title={Document Clustering Using Multi-Objective Genetic Algorithms with Parallel Programming Based on CUDA},
booktitle={Proceedings of the 11th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,},
year={2014},
pages={280-287},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005057502800287},
isbn={978-989-758-039-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 11th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,
TI - Document Clustering Using Multi-Objective Genetic Algorithms with Parallel Programming Based on CUDA
SN - 978-989-758-039-0
AU - Lee J.
AU - Park S.
AU - Lee J.
AU - Ham H.
PY - 2014
SP - 280
EP - 287
DO - 10.5220/0005057502800287