SCUBA DIVER: SUBSPACE CLUSTERING OF WEB SEARCH RESULTS

Fatih Gelgi, Srinivas Vadrevu, Hasan Davulcu

Abstract

Current search engines present their search results as a ranked list of Web pages. However, as the number of pages on theWeb increases exponentially, so does the number of search results for any given query. We present a novel subspace clustering based algorithm to organize keyword search results by simultaneously clustering and identifying distinguishing terms for each cluster. Our system, named Scuba Diver, enables users to better interpret the coverage of millions of search results and to refine their search queries through a keyword guided interface. We present experimental results illustrating the effectiveness of our algorithm by measuring purity, entropy and F-measure of generated clusters based on Open Directory Project (ODP).

References

  1. Agarwal, N., Haque, E., Liu, H., and Parsons, L. (2006). A subspace clustering framework for research group collaboration. International Journal of Information Technology and Web Engineering, 1(1):35-38.
  2. Beil, F., Ester, M., and Xu, X. (2002). Frequent term-based text clustering. In Proceedings of SIGKDD'02, pages 436-442, New York, NY, USA. ACM Press.
  3. Crescenzi, V., Merialdo, P., and Missier, P. (2005). Clustering web pages based on their structure. Data and Knowledge Engineering, 54:279-299.
  4. Leouski, A. and Croft, W. B. (1996). An evaluation of techniques for clustering search results. Technical Report IR-76, University of Massachusetts, Amherst.
  5. Leuski, A. and Allan, J. (2000). Improving interactive retrieval by combining ranked lists and clustering. In Proceedings of RIAO'2000, pages 665-681.
  6. Parsons, L., Haque, E., and Liu, H. (2004). Subspace clustering for high dimensional data: A review. SIGKDD Explorations, 6(1):90.
  7. Rosell, M., Kann, V., and Litton, J.-E. (2004). Comparing comparisons: Document clustering evaluation using two manual classifications. In Proceedings of ICON'04.
  8. Strehl, A., Ghosh, J., and Mooney, R. (2000). Impact of similarity measures on web-page clustering. In Proceedings of AAAI'00, pages 58-64. AAAI.
  9. Zeng, H.-J., He, Q.-C., Chen, Z., Ma, W.-Y., and Ma, J. (2004). Learning to cluster web search results. In Proceedings of ACM SIGIR'04, pages 210-217, New York, NY, USA. ACM Press.
Download


Paper Citation


in Harvard Style

Gelgi F., Vadrevu S. and Davulcu H. (2007). SCUBA DIVER: SUBSPACE CLUSTERING OF WEB SEARCH RESULTS . In Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 2: WEBIST, ISBN 978-972-8865-78-8, pages 334-339. DOI: 10.5220/0001288503340339


in Bibtex Style

@conference{webist07,
author={Fatih Gelgi and Srinivas Vadrevu and Hasan Davulcu},
title={SCUBA DIVER: SUBSPACE CLUSTERING OF WEB SEARCH RESULTS},
booktitle={Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,},
year={2007},
pages={334-339},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001288503340339},
isbn={978-972-8865-78-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Third International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,
TI - SCUBA DIVER: SUBSPACE CLUSTERING OF WEB SEARCH RESULTS
SN - 978-972-8865-78-8
AU - Gelgi F.
AU - Vadrevu S.
AU - Davulcu H.
PY - 2007
SP - 334
EP - 339
DO - 10.5220/0001288503340339