BLOG CLASSIFICATION USING K-MEANS

Ki Jun Lee, Myungjin Lee, Wooju Kim

2009

Abstract

With the recent exponential growth of blogs, a vast amount of important data has appeared on blogs. However, dynamic, autonomous, and personal features of such blogs make blog pages be quite different from those on general web pages in many aspects. As a result, this also causes many problems which cannot be handled properly by general search engines. One of the problems which we focused in this study is that blog pages are inherently poorly-organized and very much duplicated. This means the blog search engines cannot but provide the poorly-organized and duplicated results. To solve this problem, we propose a blog classification method using K-means and present a blog search result reorganization approach based on this method. In this study, firstly, we review the current status and their performances of blogs and blog search engines. Secondly, we adopt the K-means algorithm as a base algorithm and devise a blog title classification method to reorganize the blog titles resulted by a search engine. Finally, by implementing a prototype system of our algorithm, we evaluate our algorithm’s effectiveness, and present a conclusion and the directions for future work. We expect this algorithm can improve the current blog search engines’ usability.

References

  1. Aixin, S., Maggy, S., Ying, L. 2007. Blog Classification Using Tags: An Empirical Study. In ICADL 2007.
  2. Broder A. 2002. A Taxonomy of Web Search. In SIGIR Forum.
  3. Chung, Y.M., Lee, J.Y. 2001. A corpus-based approach to comparative evaluation of statistical term association measures. In J. of the American Society for Information Science and Technology.
  4. Fujiki, T., Nanno, T., Suzuki, Y., Okumura, M. 2004. Identification of Bursts in a Document Stream. In First International Workshop on Knowledge Discovery 2004.
  5. Fujimura, K.,Toda, H., Inoue, T., Hiroshima, N., Kataoka, R., Sugizaki M. 2006. BLOGRANGER - A multifaceted Blog Search Engine. In WWW 2006.
  6. Gilad, M., Maarten, R. 2006. A Study of Blog Search. In ECIR 2006. LNCS 3936.
  7. Kumar, R., Novak, J., Raghavan, P., Tomkins, A. 2003. On the bursty evolution of blogspace. In WWW'03: Proceedings of the 12th international conference on world wide web. ACM Press.
  8. Macqueen J. 1967. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics. University of California Press.
  9. Mukul, J., Nikhil, B. 2006. BlogHarvest: Blog Mining and Search Framework. In COMAD 2006.
  10. Rand, W.M. 1971. Objective Criteria for The Evaluation of clustering Methods. In J. of the American Statistical Association.
  11. Takama, Y., Kajinami, T., Matsumura, A. Application of Keyword Map-based Relevance Feedback to Interactive Blog Search. In IEEE 2005.
  12. Technorati Weblog: State of the Blogsphere, http://technorati.com/weblog/2006/02/83.html
Download


Paper Citation


in Harvard Style

Jun Lee K., Lee M. and Kim W. (2009). BLOG CLASSIFICATION USING K-MEANS . In Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 4: ICEIS, ISBN 978-989-8111-87-6, pages 61-67. DOI: 10.5220/0001949600610067


in Bibtex Style

@conference{iceis09,
author={Ki Jun Lee and Myungjin Lee and Wooju Kim},
title={BLOG CLASSIFICATION USING K-MEANS},
booktitle={Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 4: ICEIS,},
year={2009},
pages={61-67},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001949600610067},
isbn={978-989-8111-87-6},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 11th International Conference on Enterprise Information Systems - Volume 4: ICEIS,
TI - BLOG CLASSIFICATION USING K-MEANS
SN - 978-989-8111-87-6
AU - Jun Lee K.
AU - Lee M.
AU - Kim W.
PY - 2009
SP - 61
EP - 67
DO - 10.5220/0001949600610067