tag cloud-based dataset is more able than the user
tagging-based searcher (b) to correctly classify those
posts irrelevant (Target = 0) to the topic of interest.
Consequently, it is more likely (higher precision) for
a post collection retrieved by the searcher (a) to
contain more relevant posts than a post collection
retrieved by the searcher (b), if the two searchers
retrieve the same number of posts.
However, the searcher (b) is more able than the
searcher (a) to correctly classify those posts relevant
(Target = 1) to the topic of interest, which means
that it is more likely (higher recall) to read more
posts from the totally available relevant posts if we
use the user tagging-based searcher.
5 CONCLUSIONS
In spite of the many available blog search engines on
the web, a little attention has been paid on how
much the blog posts that these search engines
retrieve are truly relevant to what the users look for.
There is a crucial need to add intelligence to the
searching mechanisms of blog posts in order to
improve their precision and recall results. This work
has proposed an efficient framework based on open
source APIs, as well as on tag cloud inspection, in
order to retrieve, analyze, and classify a collection of
blog posts used to train and build an intelligent
predictive searcher for filtering the results of search
engines, thus improving the relevance rate of the
posts returned to the user.
In comparison with another popular approach to
blog search improvement, user-tagging, results show
that relying on tag cloud inspection to classify a
collection of blog posts for training a predictive blog
searcher is a good decision to take. Moreover, it is
recommended by this work to apply the tag cloud-
based dataset learning approach for building blog
post classification models when precision in the
returned results of the model is more important for
the application domain than recall.
REFERENCES
Alag S., 2009. Collective Intelligence in Action. Manning
Publications, Greenwich
Apache Lucene, http://lucene.apache.org/java/docs/
Cios, K. J., Pedrycz, W., Swiniarski, R. W., Kurgan, L. A.,
2007. Data Mining, a Knowledge Discovery
Approach. Springer, New York
Depken II, Craig A., 2008. “Benford, Zipf and the
blogosphere.”, Applied Economics Letters, 15:9, 689
– 692
Hearst M., Hurst M., Dumais S., 2008. “What should blog
search look like?”, In: Proceedings of the 2008 ACM
workshop on Search in social media, pp. 95 – 98,
California, USA
Hearst M., Rosner D., 2008. “Tag Clouds: Data Analysis
Tool or Social Signaller?”, In: Proceedings of the 41st
Annual Hawaii International Conference on System
Sciences, p.160
Herring S., Kouper I., Paolillo J., Scheidt L., Tyworth M.,
Welsch P., Wright E., Yu N., 2005. “Conversations in
the Blogosphere: An Analysis from the Bottom Up.”,
In: Proceedings of the 38th Hawaii International
Conference on System Sciences HICSS’05
Hornick, M. F., Marcadé, E., Venkayala, S., 2007. Java
Data Mining: Strategy, Standard, and Practice. The
Morgan Kaufmann Series in Data Management
Systems, Morgan Kaufmann, San Francisco
Kobayashi M., Aono M., 2008. “Vector Space Models for
Search and Cluster Mining.”, In: Survey of Text
Mining II, Springer-Verlag, London
Masand B., Linoff G., Waltz D., 1992. “Classifying news
stories using memory based reasoning.”, In:
Proceedings of the 15th annual international ACM
SIGIR conference on Research and development in
information retrieval, p.59-65, Denmark
Michlmayr E., Cayzer S., 2007. “Learning user profiles
from tagging data and leveraging them for
personalized information access.”, In: Proceedings of
the Workshop on Tagging and Metadata for Social
Information Organization, 16th International World
Wide Web Conference
Millen D., Feinberg J., 2006. “Using social tagging to
improve social navigation.”, In: AH2006 workshop,
Social navigation and community-based adaptation,
Dublin, Ireland
Rachlin J., Kasif S., Aha D., 1994. “Toward a better
understanding of memory-based reasoning systems.”
In: Proceedings of the Eleventh International Machine
Learning Conference. Morgan Kaufmann. 242–250
SAS Enterprise Miner: a Data Mining Software,
http://www.sas.com/technologies/analytics/datamining
/miner/
Technorati Blog Directory, http://technorati.com/blogs/
directory/
Zharkova V., Ammari A., 2009. “Combining Tag Cloud
Learning with SVM classification to achieve
Intelligent Search for Relevant Blog Articles”, In:
Proceedings of the 1st International Workshop on
Mining Social Media, paper#7, Sevilla, Spain
LEARNING FROM 'TAG CLOUDS' - A Novel Approach to Build Datasets for Memory-based Reasoning Classification of
Relevant Blog Articles
331