A NEW TECHNIQUE FOR IDENTIFICATION OF RELEVANT WEB PAGES IN INFORMATIONAL QUERIES RESULTS
Fabio Clarizia, Luca Greco, Paolo Napoletano
2010
Abstract
In this paper we present a new technique for retrieving relevant web pages in informational queries results. The proposed technique, based on a probabilistic model of language, is embedded in a traditional web search engine. The relevance of aWeb page has been obtained through the judgment of human beings which, referring to continue scale, have assigned a degree of importance to each of the analyzed websites. In order to validate the proposed method a comparison with a classic engine is presented showing comparison based on a measure of Precision and Recall and on a measure of distance with respect to the measure of significance obtained by humans.
References
- Bar-Ilan, J. (2004). Methods for measuring search engine performance over time. Journal of the American Society for Information Science and Technology, 53(308- 319).
- Berners-Lee, T., Hendler, J., and Lassila, O. (2001). The semantic web. Scientific American, May.
- Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(993-1022).
- Brin, S. (1998). The anatomy of a large-scale hypertextual web search engine. In Computer Networks and ISDN Systems, pages 107-117.
- Christopher D. Manning, P. R. and Schtze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.
- Colace, F., Santo, M. D., and Napoletano, P. (2008). A note on methodology for designing ontology management systems. In AAAI Spring Symposium.
- Heting Chu, M. R. (1996). Search engines for the world wide web: a comparative study and evaluation methodology. In In Proceedings of the 59th annual meeting of the American Society for Information Science, pages 127-135.
- Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the Twenty-Second Annual International SIGIR Conference.
- Howard Greisdorf, A. S. (2001). Median measure: an approach to ir systems evaluation. Information Processing and Management, 37(6)(843-857).
- Manning, C. D. and Schütze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA, USA.
- Michael Gordon, P. P. (1999). Finding information on the world wide web: the retrieval effectiveness of search engines. Information Processing and Management, 35(141-180).
- Saari, D. G. (2001). Chaotic Elections! A Mathematician Looks at Voting. American Mathematical Society, Providence.
- Silverstein, C., Marais, H., Henzinger, M., and Moricz, M. (1999). Analysis of a very large web search engine query log. ACM SIGIR Forum, 40(677-691).
- T. L. Griffiths, M. Steyvers, J. B. T. (2007). Topics in semantic representation. Psychological Review, 114(2):211-244.
- Vaughan, L. (2004). New measurements for search engine evaluation. Information Processing and Management, 40(677-691).
- Voorhees, E. M. (2003). Overview of trec 2003. In In Proceedings of the 12th Text Retrieval Conference, pages 1-13.
Paper Citation
in Harvard Style
Clarizia F., Greco L. and Napoletano P. (2010). A NEW TECHNIQUE FOR IDENTIFICATION OF RELEVANT WEB PAGES IN INFORMATIONAL QUERIES RESULTS . In Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 3: ICEIS, ISBN 978-989-8425-06-5, pages 70-79. DOI: 10.5220/0002903100700079
in Bibtex Style
@conference{iceis10,
author={Fabio Clarizia and Luca Greco and Paolo Napoletano},
title={A NEW TECHNIQUE FOR IDENTIFICATION OF RELEVANT WEB PAGES IN INFORMATIONAL QUERIES RESULTS},
booktitle={Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 3: ICEIS,},
year={2010},
pages={70-79},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002903100700079},
isbn={978-989-8425-06-5},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 12th International Conference on Enterprise Information Systems - Volume 3: ICEIS,
TI - A NEW TECHNIQUE FOR IDENTIFICATION OF RELEVANT WEB PAGES IN INFORMATIONAL QUERIES RESULTS
SN - 978-989-8425-06-5
AU - Clarizia F.
AU - Greco L.
AU - Napoletano P.
PY - 2010
SP - 70
EP - 79
DO - 10.5220/0002903100700079