LEARNING FROM ‘TAG CLOUDS’ - A Novel Approach to Build Datasets for Memory-based Reasoning Classification of Relevant Blog Articles

Ahmad Ammari, Valentina Zharkova

Abstract

The advent of the Social Web has created massive online media through turning the former information consumers to present information producers. The best example is the blogosphere. Blog websites are a collection of articles written by millions of blog writers to millions of blog readers. Blogging has become a very popular means for Web 2.0 users to communicate, express, share, collaborate, and debate through their blog posts. However, as a consequence to the very massive number of blogs as well as the so diverse topics of blog posts available on the Web, most blog search engines encounter the serious challenge of finding the blog articles that are truly relevant to the certain topic that blog readers may look for. To help handling this problem, an intelligent approach to blog post search that takes advantage from the concept of ‘tag clouds’ and leverages many open source libraries, has been proposed. A Memory-Based Reasoning model has been built using SAS Enterprise Miner to assess the approach effectiveness. Results are very encouraging as retrieval precision has indicated a significant improvement in retrieving relevant posts to the user compared with traditional means of blog post retrieval.

References

  1. Alag S., 2009. Collective Intelligence in Action. Manning Publications, Greenwich
  2. Cios, K. J., Pedrycz, W., Swiniarski, R. W., Kurgan, L. A., 2007. Data Mining, a Knowledge Discovery Approach. Springer, New York
  3. Depken II, Craig A., 2008. “Benford, Zipf and the blogosphere.”, Applied Economics Letters, 15:9, 689 - 692
  4. Hearst M., Hurst M., Dumais S., 2008. “What should blog search look like?”, In: Proceedings of the 2008 ACM workshop on Search in social media, pp. 95 - 98, California, USA
  5. Hearst M., Rosner D., 2008. “Tag Clouds: Data Analysis Tool or Social Signaller?”, In: Proceedings of the 41st Annual Hawaii International Conference on System Sciences, p.160
  6. Herring S., Kouper I., Paolillo J., Scheidt L., Tyworth M., Welsch P., Wright E., Yu N., 2005. “Conversations in the Blogosphere: An Analysis from the Bottom Up.”, In: Proceedings of the 38th Hawaii International Conference on System Sciences HICSS'05
  7. Hornick, M. F., Marcadé, E., Venkayala, S., 2007. Java Data Mining: Strategy, Standard, and Practice. The Morgan Kaufmann Series in Data Management Systems, Morgan Kaufmann, San Francisco
  8. Kobayashi M., Aono M., 2008. “Vector Space Models for Search and Cluster Mining.”, In: Survey of Text Mining II, Springer-Verlag, London
  9. Masand B., Linoff G., Waltz D., 1992. “Classifying news stories using memory based reasoning.”, In: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, p.59-65, Denmark
  10. Michlmayr E., Cayzer S., 2007. “Learning user profiles from tagging data and leveraging them for personalized information access.”, In: Proceedings of the Workshop on Tagging and Metadata for Social Information Organization, 16th International World Wide Web Conference
  11. Millen D., Feinberg J., 2006. “Using social tagging to improve social navigation.”, In: AH2006 workshop, Social navigation and community-based adaptation, Dublin, Ireland
  12. Rachlin J., Kasif S., Aha D., 1994. “Toward a better understanding of memory-based reasoning systems.” In: Proceedings of the Eleventh International Machine Learning Conference. Morgan Kaufmann. 242-250
  13. Zharkova V., Ammari A., 2009. “Combining Tag Cloud Learning with SVM classification to achieve Intelligent Search for Relevant Blog Articles”, In: Proceedings of the 1st International Workshop on Mining Social Media, paper#7, Sevilla, Spain
Download


Paper Citation


in Harvard Style

Ammari A. and Zharkova V. (2010). LEARNING FROM ‘TAG CLOUDS’ - A Novel Approach to Build Datasets for Memory-based Reasoning Classification of Relevant Blog Articles . In Proceedings of the 6th International Conference on Web Information Systems and Technology - Volume 1: WEBIST, ISBN 978-989-674-025-2, pages 325-331. DOI: 10.5220/0002884003250331


in Bibtex Style

@conference{webist10,
author={Ahmad Ammari and Valentina Zharkova},
title={LEARNING FROM ‘TAG CLOUDS’ - A Novel Approach to Build Datasets for Memory-based Reasoning Classification of Relevant Blog Articles},
booktitle={Proceedings of the 6th International Conference on Web Information Systems and Technology - Volume 1: WEBIST,},
year={2010},
pages={325-331},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002884003250331},
isbn={978-989-674-025-2},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 6th International Conference on Web Information Systems and Technology - Volume 1: WEBIST,
TI - LEARNING FROM ‘TAG CLOUDS’ - A Novel Approach to Build Datasets for Memory-based Reasoning Classification of Relevant Blog Articles
SN - 978-989-674-025-2
AU - Ammari A.
AU - Zharkova V.
PY - 2010
SP - 325
EP - 331
DO - 10.5220/0002884003250331