# Non-negative Matrix Factorization for Binary Data

### Jacob Søgaard Larsen, Line Katrine Harder Clemmensen

#### Abstract

We propose the Logistic Non-negative Matrix Factorization for decomposition of binary data. Binary data are frequently generated in e.g. text analysis, sensory data, market basket data etc. A common method for analysing non-negative data is the Non-negative Matrix Factorization, though this is in theory not appropriate for binary data, and thus we propose a novel Non-negative Matrix Factorization based on the logistic link function. Furthermore we generalize the method to handle missing data. The formulation of the method is compared to a previously proposed logistic matrix factorization without non-negativity constraint on the features. We compare the performance of the Logistic Non-negative Matrix Factorization to Least Squares Non-negative Matrix Factorization and Kullback-Leibler (KL) Non-negative Matrix Factorization on sets of binary data: a synthetic dataset, a set of student comments on their professors collected in a binary termdocument matrix and a sensory dataset. We find that choosing the number of components is an essential part in the modelling and interpretation, that is still unresolved.

#### References

- Boyd, S. and Vandenberghe, L. (2009). Convex Optimization. Cambridge University Press.
- Gillis, N. (2014). The why and how of nonnegative matrix factorization. ArXiv e-prints.
- Gillis, N. and Glineur, F. (2012). Accelerated multiplicative updates and hierarchical als algorithms for nonnegative matrix factorization. NEURAL COMPUTATION, 24(4):1085-1105.
- Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning - Data Mining, Inference, and Prediction. Springer, 2nd edition.
- Lee, D. D. and Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788-791.
- Lee, D. D. and Seung, H. S. (2001). Algorithms for nonnegative matrix factorization. Advances in Neural Information Processing Systems, 13(13):556-562.
- Nielsen, S. F. V. and Mørup, M. (2014). Non-negative tensor factorization with missing data for the modeling of gene expressions in the human brain. In 2014 IEEE International workshop on Machine Learning for Signal Processing.
- Paukkeri, M.-S. (2012). Language- and domain- independent text mining. Doctorial Dissertations. Aalto University.
- Randall, J. (1989). The analysis of sensory data by generalised linear model. Biometrical journal, 7:pp. 781- 793.
- Tomé, A. M., Schachtner, R., Vigneron, V., Puntonet, C. G., and Lang, E. W. (2015). A logistic non-negative matrix factorisation approach to binary data sets. Multidim Syst Sign Process, 26:125-143.
- Zhang, Z., Li, T., Ding, C., and Zhang, X. (2010). Binary matrix factorization with applications. Data Mining and Knowledge Discovery, 20(1):28-52.

#### Paper Citation

#### in Harvard Style

Larsen J. and Clemmensen L. (2015). **Non-negative Matrix Factorization for Binary Data** . In *Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: SSTM, (IC3K 2015)* ISBN 978-989-758-158-8, pages 555-563. DOI: 10.5220/0005614805550563

#### in Bibtex Style

@conference{sstm15,

author={Jacob Søgaard Larsen and Line Katrine Harder Clemmensen},

title={Non-negative Matrix Factorization for Binary Data},

booktitle={Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: SSTM, (IC3K 2015)},

year={2015},

pages={555-563},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0005614805550563},

isbn={978-989-758-158-8},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: SSTM, (IC3K 2015)

TI - Non-negative Matrix Factorization for Binary Data

SN - 978-989-758-158-8

AU - Larsen J.

AU - Clemmensen L.

PY - 2015

SP - 555

EP - 563

DO - 10.5220/0005614805550563