MONTE CARLO PROJECTIVE CLUSTERING OF TEXTS

Vladimír Ljubopytnov, Jaroslav Pokorný

Abstract

In this paper we propose a new, improved version of a Monte Carlo projective clustering algorithm – DOC. DOC was designed for general vector data and we extend it to deal with variable dimension significance and use it in web search snippets clustering. We discuss advantages and weaknesses of our approach with respect to known algorithms.

References

  1. Cox, D. R. and Hinkley, D. V: Theoretical Statistics, Chapman and Hall, 1974.
  2. Ferragin, P. and Gulli, A.: A personalized search engine based on Web-snippet hierarchical clustering. In: Proc. of 14th International Conference on World Wide Web 2005, Chiba, Japan, pp. 801-810.
  3. Húsek, D., Pokorný, J., Rezanková, H., Snášel, V.: Data Clustering: From Documents to the Web. Chapter 1 in book Web Data Management Practices: Emerging Techniques and Technologies, A. Vakali and Pallis Eds., Idea Group Inc., 2007, p. 1-33.
  4. Li, J., Yao, T.: An Efficient Token-based Approach for Web-Snippet Clustering. In: Proc, of the Second International Conference on Semantics, Knowledge, and Grid (SKG'06), 2006, 6 pages.
  5. Ljubopytnov, V.: Webmining. Master thesis, Charles University, Prague, 2007 (In Czech).
  6. Manning, Ch., D., Raghavan, P., and Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press, 2008.
  7. Mecca, G., Raunich, S., and Pappalardo, A.: A new algorithm for clustering search results. Data & Knowledge Engineering. Volume 62, Issue 3 (September 2007) pp. 504-522, 2007
  8. Maslowska, I.: Phrase-based hierarchical clustering of web search results. In: Advances in Information Retrieval - ECIR 2003, LNCS, vol.2633, Springer-Verlag Berlin, 2003, pp. 555-562
  9. Osinski, O., Stefanowski, J., and Weiss, D.: Lingo: Search results clustering algorithm based on singular value decomposition. In: Proceedings of the International Conference on Intelligent Information Systems (IIPWM), 2004, pp. 369-377.
  10. Procopiuc, C, Jones, M., Agarwal, P., and Murali. T.: A Monte Carlo algorithm for fast projective clustering. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, ACM Press, 2002, pp. 418-427.
  11. Zamir O. and Etzioni, O.: Web document clustering: A feasibility demonstration. In: Proceedings of ACM SIGIR Conf. on Research and Development in Information Retrieval, 1998, pp. 46-54.
Download


Paper Citation


in Harvard Style

Ljubopytnov V. and Pokorný J. (2009). MONTE CARLO PROJECTIVE CLUSTERING OF TEXTS . In Proceedings of the 4th International Conference on Software and Data Technologies - Volume 2: ICSOFT, ISBN 978-989-674-010-8, pages 237-242. DOI: 10.5220/0002247602370242


in Bibtex Style

@conference{icsoft09,
author={Vladimír Ljubopytnov and Jaroslav Pokorný},
title={MONTE CARLO PROJECTIVE CLUSTERING OF TEXTS},
booktitle={Proceedings of the 4th International Conference on Software and Data Technologies - Volume 2: ICSOFT,},
year={2009},
pages={237-242},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002247602370242},
isbn={978-989-674-010-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 4th International Conference on Software and Data Technologies - Volume 2: ICSOFT,
TI - MONTE CARLO PROJECTIVE CLUSTERING OF TEXTS
SN - 978-989-674-010-8
AU - Ljubopytnov V.
AU - Pokorný J.
PY - 2009
SP - 237
EP - 242
DO - 10.5220/0002247602370242