Non-negative Matrix Factorization for Binary Data
Jacob Søgaard Larsen, Line Katrine Harder Clemmensen
2015
Abstract
We propose the Logistic Non-negative Matrix Factorization for decomposition of binary data. Binary data are frequently generated in e.g. text analysis, sensory data, market basket data etc. A common method for analysing non-negative data is the Non-negative Matrix Factorization, though this is in theory not appropriate for binary data, and thus we propose a novel Non-negative Matrix Factorization based on the logistic link function. Furthermore we generalize the method to handle missing data. The formulation of the method is compared to a previously proposed logistic matrix factorization without non-negativity constraint on the features. We compare the performance of the Logistic Non-negative Matrix Factorization to Least Squares Non-negative Matrix Factorization and Kullback-Leibler (KL) Non-negative Matrix Factorization on sets of binary data: a synthetic dataset, a set of student comments on their professors collected in a binary termdocument matrix and a sensory dataset. We find that choosing the number of components is an essential part in the modelling and interpretation, that is still unresolved.
References
- Boyd, S. and Vandenberghe, L. (2009). Convex Optimization. Cambridge University Press.
- Gillis, N. (2014). The why and how of nonnegative matrix factorization. ArXiv e-prints.
- Gillis, N. and Glineur, F. (2012). Accelerated multiplicative updates and hierarchical als algorithms for nonnegative matrix factorization. NEURAL COMPUTATION, 24(4):1085-1105.
- Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning - Data Mining, Inference, and Prediction. Springer, 2nd edition.
- Lee, D. D. and Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788-791.
- Lee, D. D. and Seung, H. S. (2001). Algorithms for nonnegative matrix factorization. Advances in Neural Information Processing Systems, 13(13):556-562.
- Nielsen, S. F. V. and Mørup, M. (2014). Non-negative tensor factorization with missing data for the modeling of gene expressions in the human brain. In 2014 IEEE International workshop on Machine Learning for Signal Processing.
- Paukkeri, M.-S. (2012). Language- and domain- independent text mining. Doctorial Dissertations. Aalto University.
- Randall, J. (1989). The analysis of sensory data by generalised linear model. Biometrical journal, 7:pp. 781- 793.
- Tomé, A. M., Schachtner, R., Vigneron, V., Puntonet, C. G., and Lang, E. W. (2015). A logistic non-negative matrix factorisation approach to binary data sets. Multidim Syst Sign Process, 26:125-143.
- Zhang, Z., Li, T., Ding, C., and Zhang, X. (2010). Binary matrix factorization with applications. Data Mining and Knowledge Discovery, 20(1):28-52.
Paper Citation
in Harvard Style
Larsen J. and Clemmensen L. (2015). Non-negative Matrix Factorization for Binary Data . In Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: SSTM, (IC3K 2015) ISBN 978-989-758-158-8, pages 555-563. DOI: 10.5220/0005614805550563
in Bibtex Style
@conference{sstm15,
author={Jacob Søgaard Larsen and Line Katrine Harder Clemmensen},
title={Non-negative Matrix Factorization for Binary Data},
booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: SSTM, (IC3K 2015)},
year={2015},
pages={555-563},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005614805550563},
isbn={978-989-758-158-8},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: SSTM, (IC3K 2015)
TI - Non-negative Matrix Factorization for Binary Data
SN - 978-989-758-158-8
AU - Larsen J.
AU - Clemmensen L.
PY - 2015
SP - 555
EP - 563
DO - 10.5220/0005614805550563