# Document Clustering Games

### Rocco Tripodi, Marcello Pelillo

#### Abstract

In this article we propose a new model for document clustering, based on game theoretic principles. Each document to be clustered is represented as a player, in the game theoretic sense, and each cluster as a strategy that the players have to choose in order to maximize their payoff. The geometry of the data is modeled as a graph, which encodes the pairwise similarity among each document and the games are played among similar players. In each game the players update their strategies, according to what strategy has been effective in previous games. The Dominant Set clustering algorithm is used to find the prototypical elements of each cluster. This information is used in order to divide the players in two disjoint sets, one collecting labeled players, which always play a definite strategy and the other one collecting unlabeled players, which update their strategy at each iteration of the games. The evaluation of the system was conducted on 13 document datasets and shows that the proposed method performs well compared to different document clustering algorithms.

#### References

- Ardanuy, M. C. and Sporleder, C. (2014). Structure-based clustering of novels. EACL 2014, pages 31-39.
- Bharat, K., Curtiss, M., and Schmitt, M. (2009). Methods and apparatus for clustering news content. US Patent 7,568,148.
- Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. J. Mach. Learn. Res., 3:993- 1022.
- Dhillon, I. S. (2001). Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 269-274. ACM.
- Ding, C., Li, T., and Peng, W. (2006). Nonnegative matrix factorization and probabilistic latent semantic indexing: Equivalence chi-square statistic, and a hybrid method. In Proceedings of the national conference on artificial intelligence , volume 21, page 342. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999.
- Erdem, A. and Pelillo, M. (2012). as a noncooperative game. 24(3):700-723.
- Haykin, S. and Network, N. (2004). A comprehensive foundation. Neural Networks, 2(2004).
- Hofbauer, J. and Sigmund, K. (2003). Evolutionary game dynamics. Bulletin of the American Mathematical Society, 40(4):479-519.
- Jain, A. K. and Dubes, R. C. (1988). Algorithms for clustering data. Prentice-Hall, Inc.
- Landauer, T. K., Foltz, P. W., and Laham, D. (1998). An introduction to latent semantic analysis. Discourse processes, 25(2-3):259-284.
- Lee, D. D. and Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788-791.
- Lovasz, L. (1986). Matching theory (north-holland mathematics studies).
- Manning, C. D., Raghavan, P., Schütze, H., et al. (2008). Introduction to information retrieval, volume 1. Cambridge university press Cambridge.
- Nash, J. (1951). Non-cooperative games. Annals of mathematics, pages 286-295.
- Nowak, M. A. and Sigmund, K. (2004). Evolutionary dynamics of biological games. science, 303(5659):793- 799.
- Okasha, S. and Binmore, K. (2012). Evolution and rationality: decisions, co-operation and strategic behaviour. Cambridge University Press.
- Pavan, M. and Pelillo, M. (2007). Dominant sets and pairwise clustering. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(1):167-172.
- Peterson, A. D. (2011). A separability index for clustering and classification problems with applications to cluster merging and systematic evaluation of clustering algorithms.
- Pompili, F., Gillis, N., Absil, P.-A., and Glineur, F. (2014). Two algorithms for orthogonal nonnegative matrix factorization with application to clustering. Neurocomputing, 141:15-25.
- Rota Bulò, S. and Pelillo, M. (2013). A game-theoretic approach to hypergraph clustering. IEEE transactions on pattern analysis and machine intelligence, 35(6):1312-1327.
- Sandholm, W. H. (2010). Population games and evolutionary dynamics. MIT press.
- Sankaranarayanan, J., Samet, H., Teitler, B. E., Lieberman, M. D., and Sperling, J. (2009). Twitterstand: news in tweets. In Proceedings of the 17th acm sigspatial international conference on advances in geographic information systems, pages 42-51. ACM.
- Shawe-Taylor, J. and Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge University Press.
- Smith, J. M. and Price, G. (1973). The Logic of Animal Conflict. Nature, 246:15.
- Strehl, A. and Ghosh, J. (2003). Cluster ensemblesa knowledge reuse framework for combining multiple partitions. The Journal of Machine Learning Research, 3:583-617.
- Szabó, G. and Fath, G. (2007). Evolutionary games on graphs. Physics Reports, 446(4):97-216.
- Tagarelli, A. and Karypis, G. (2013). Document clustering: The next frontier. Data Clustering: Algorithms and Applications, page 305.
- Taylor, P. D. and Jonker, L. B. (1978). Evolutionary Stable Strategies and Game Dynamics. Mathematical Biosciences, 40(1):145-156.
- Von Neumann, J. and Morgenstern, O. (1944). Theory of Games and Economic Behavior (60th Anniversary Commemorative Edition). Princeton University Press.
- Weibull, J. W. (1997). Evolutionary game theory. MIT press.
- Xu, W., Liu, X., and Gong, Y. (2003). Document clustering based on non-negative matrix factorization. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 267-273. ACM.
- Zhao, Y. and Karypis, G. (2004). Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning, 55(3):311-331.
- Zhao, Y., Karypis, G., and Fayyad, U. (2005). Hierarchical clustering algorithms for document datasets. Data mining and knowledge discovery, 10(2):141-168.
- Zhong, S. and Ghosh, J. (2005). Generative model-based document clustering: a comparative study. Knowledge and Information Systems, 8(3):374-384.

#### Paper Citation

#### in Harvard Style

Tripodi R. and Pelillo M. (2016). **Document Clustering Games** . In *Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,* ISBN 978-989-758-173-1, pages 109-118. DOI: 10.5220/0005798601090118

#### in Bibtex Style

@conference{icpram16,

author={Rocco Tripodi and Marcello Pelillo},

title={Document Clustering Games},

booktitle={Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},

year={2016},

pages={109-118},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0005798601090118},

isbn={978-989-758-173-1},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,

TI - Document Clustering Games

SN - 978-989-758-173-1

AU - Tripodi R.

AU - Pelillo M.

PY - 2016

SP - 109

EP - 118

DO - 10.5220/0005798601090118