Non-negative Matrix Factorization for Binary Data

Jacob Søgaard Larsen, Line Katrine Harder Clemmensen

2015

Abstract

We propose the Logistic Non-negative Matrix Factorization for decomposition of binary data. Binary data are frequently generated in e.g. text analysis, sensory data, market basket data etc. A common method for analysing non-negative data is the Non-negative Matrix Factorization, though this is in theory not appropriate for binary data, and thus we propose a novel Non-negative Matrix Factorization based on the logistic link function. Furthermore we generalize the method to handle missing data. The formulation of the method is compared to a previously proposed logistic matrix factorization without non-negativity constraint on the features. We compare the performance of the Logistic Non-negative Matrix Factorization to Least Squares Non-negative Matrix Factorization and Kullback-Leibler (KL) Non-negative Matrix Factorization on sets of binary data: a synthetic dataset, a set of student comments on their professors collected in a binary termdocument matrix and a sensory dataset. We find that choosing the number of components is an essential part in the modelling and interpretation, that is still unresolved.

References

  1. Boyd, S. and Vandenberghe, L. (2009). Convex Optimization. Cambridge University Press.
  2. Gillis, N. (2014). The why and how of nonnegative matrix factorization. ArXiv e-prints.
  3. Gillis, N. and Glineur, F. (2012). Accelerated multiplicative updates and hierarchical als algorithms for nonnegative matrix factorization. NEURAL COMPUTATION, 24(4):1085-1105.
  4. Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning - Data Mining, Inference, and Prediction. Springer, 2nd edition.
  5. Lee, D. D. and Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788-791.
  6. Lee, D. D. and Seung, H. S. (2001). Algorithms for nonnegative matrix factorization. Advances in Neural Information Processing Systems, 13(13):556-562.
  7. Nielsen, S. F. V. and Mørup, M. (2014). Non-negative tensor factorization with missing data for the modeling of gene expressions in the human brain. In 2014 IEEE International workshop on Machine Learning for Signal Processing.
  8. Paukkeri, M.-S. (2012). Language- and domain- independent text mining. Doctorial Dissertations. Aalto University.
  9. Randall, J. (1989). The analysis of sensory data by generalised linear model. Biometrical journal, 7:pp. 781- 793.
  10. Tomé, A. M., Schachtner, R., Vigneron, V., Puntonet, C. G., and Lang, E. W. (2015). A logistic non-negative matrix factorisation approach to binary data sets. Multidim Syst Sign Process, 26:125-143.
  11. Zhang, Z., Li, T., Ding, C., and Zhang, X. (2010). Binary matrix factorization with applications. Data Mining and Knowledge Discovery, 20(1):28-52.
Download


Paper Citation


in Harvard Style

Larsen J. and Clemmensen L. (2015). Non-negative Matrix Factorization for Binary Data . In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: SSTM, (IC3K 2015) ISBN 978-989-758-158-8, pages 555-563. DOI: 10.5220/0005614805550563


in Bibtex Style

@conference{sstm15,
author={Jacob Søgaard Larsen and Line Katrine Harder Clemmensen},
title={Non-negative Matrix Factorization for Binary Data},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: SSTM, (IC3K 2015)},
year={2015},
pages={555-563},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005614805550563},
isbn={978-989-758-158-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: SSTM, (IC3K 2015)
TI - Non-negative Matrix Factorization for Binary Data
SN - 978-989-758-158-8
AU - Larsen J.
AU - Clemmensen L.
PY - 2015
SP - 555
EP - 563
DO - 10.5220/0005614805550563