FINDING SUITABLE KEYWORDS FOR A WEB PAGE FROM CACHES BASED ON SIMILARITY AND FREQUENCY

Yasuhiro Tajima; Yoshiyuki Kotani

doi:10.5220/0001289204740477

FINDING SUITABLE KEYWORDS FOR A WEB PAGE FROM CACHES BASED ON SIMILARITY AND FREQUENCY

Yasuhiro Tajima, Yoshiyuki Kotani

2007

Abstract

Meta data are most important entry in a web page for summarization, indexing, and so on. Unfortunately, there are many kind of matadata item but there are few guidelines for construct the metadata for a web page. We propose an metadata finding method for a web page by searching the internet caches and selecting suitable items for the target page. Our method is based on a bayesian method which is used in the area of text retrieval. We evaluate this method by an experiment to find a set of suitable keywords for a source web page. Compareing the original metatagged keywords and the system output, we obtain 74% precision and 76% recall. We can conclude that this method finds the tendency of metadata which is annotated to the pages similar to the target page.

References

Heiner Stuckenschmidt, F. v. H. (2001). Ontology-based metadata generation from semi-structured information. In Proceedings of the First Conference on Knowledge Capture (K-CAP'01), pages 440-444.
Jane Greenberg, Kristina Spurgin, A. C. (2005). Final report for the amega (automatic metadata generation applications) project. In University of North Carolina at Chapel Hill.
Jihie Kim, Yolanda Gil, V. R. (2006). Semantic metadata generation for large scientific workflows. In Proceedings of the 5th International Semantic Web Conference 2006 (ISWC2006), pages 357-370.
Jürgen Belizki, Stefania Costache, W. N. (2006). Application independent metadata generation. In Proceedings of the 1st international workshop on Contextualized attention metadata: collecting, managing and exploiting of rich usage information(CAMA06), pages 33-36.
Paynter, G. W. (2005). Developing practical automatic metadata assignment and evaluation tools for internet resources. In Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, pages 291-300.
Solomon Atnafu, Richard Chbeir, L. B. (2002). Efficient content-based and metadata retrieval in image database. In Journal of Universal Computer Science, volume 8, pages 613-622.

Download

Paper Citation

in Harvard Style

Tajima Y. and Kotani Y. (2007). FINDING SUITABLE KEYWORDS FOR A WEB PAGE FROM CACHES BASED ON SIMILARITY AND FREQUENCY . In Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 2: WEBIST, ISBN 978-972-8865-78-8, pages 474-477. DOI: 10.5220/0001289204740477

in Bibtex Style

@conference{webist07,
author={Yasuhiro Tajima and Yoshiyuki Kotani},
title={FINDING SUITABLE KEYWORDS FOR A WEB PAGE FROM CACHES BASED ON SIMILARITY AND FREQUENCY},
booktitle={Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,},
year={2007},
pages={474-477},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001289204740477},
isbn={978-972-8865-78-8},
}

in EndNote Style

TY - CONF
JO - Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,
TI - FINDING SUITABLE KEYWORDS FOR A WEB PAGE FROM CACHES BASED ON SIMILARITY AND FREQUENCY
SN - 978-972-8865-78-8
AU - Tajima Y.
AU - Kotani Y.
PY - 2007
SP - 474
EP - 477
DO - 10.5220/0001289204740477