FINDING SUITABLE KEYWORDS FOR A WEB PAGE FROM CACHES BASED ON SIMILARITY AND FREQUENCY
Yasuhiro Tajima, Yoshiyuki Kotani
2007
Abstract
Meta data are most important entry in a web page for summarization, indexing, and so on. Unfortunately, there are many kind of matadata item but there are few guidelines for construct the metadata for a web page. We propose an metadata finding method for a web page by searching the internet caches and selecting suitable items for the target page. Our method is based on a bayesian method which is used in the area of text retrieval. We evaluate this method by an experiment to find a set of suitable keywords for a source web page. Compareing the original metatagged keywords and the system output, we obtain 74% precision and 76% recall. We can conclude that this method finds the tendency of metadata which is annotated to the pages similar to the target page.
References
- Heiner Stuckenschmidt, F. v. H. (2001). Ontology-based metadata generation from semi-structured information. In Proceedings of the First Conference on Knowledge Capture (K-CAP'01), pages 440-444.
- Jane Greenberg, Kristina Spurgin, A. C. (2005). Final report for the amega (automatic metadata generation applications) project. In University of North Carolina at Chapel Hill.
- Jihie Kim, Yolanda Gil, V. R. (2006). Semantic metadata generation for large scientific workflows. In Proceedings of the 5th International Semantic Web Conference 2006 (ISWC2006), pages 357-370.
- Jürgen Belizki, Stefania Costache, W. N. (2006). Application independent metadata generation. In Proceedings of the 1st international workshop on Contextualized attention metadata: collecting, managing and exploiting of rich usage information(CAMA06), pages 33-36.
- Paynter, G. W. (2005). Developing practical automatic metadata assignment and evaluation tools for internet resources. In Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, pages 291-300.
- Solomon Atnafu, Richard Chbeir, L. B. (2002). Efficient content-based and metadata retrieval in image database. In Journal of Universal Computer Science, volume 8, pages 613-622.
Paper Citation
in Harvard Style
Tajima Y. and Kotani Y. (2007). FINDING SUITABLE KEYWORDS FOR A WEB PAGE FROM CACHES BASED ON SIMILARITY AND FREQUENCY . In Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 2: WEBIST, ISBN 978-972-8865-78-8, pages 474-477. DOI: 10.5220/0001289204740477
in Bibtex Style
@conference{webist07,
author={Yasuhiro Tajima and Yoshiyuki Kotani},
title={FINDING SUITABLE KEYWORDS FOR A WEB PAGE FROM CACHES BASED ON SIMILARITY AND FREQUENCY},
booktitle={Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,},
year={2007},
pages={474-477},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001289204740477},
isbn={978-972-8865-78-8},
}
in EndNote Style
TY - CONF
JO - Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,
TI - FINDING SUITABLE KEYWORDS FOR A WEB PAGE FROM CACHES BASED ON SIMILARITY AND FREQUENCY
SN - 978-972-8865-78-8
AU - Tajima Y.
AU - Kotani Y.
PY - 2007
SP - 474
EP - 477
DO - 10.5220/0001289204740477