Authors:
Jacob Søgaard Larsen
and
Line Katrine Harder Clemmensen
Affiliation:
Technical University of Denmark, Denmark
Keyword(s):
Non-negative Matrix Factorization, Binary Data, Binary Matrix Factorization, Text Modelling.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Business Analytics
;
Computational Intelligence
;
Data Analytics
;
Data Engineering
;
Evolutionary Computing
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Mining Text and Semi-Structured Data
;
Soft Computing
;
Symbolic Systems
Abstract:
We propose the Logistic Non-negative Matrix Factorization for decomposition of binary data. Binary data
are frequently generated in e.g. text analysis, sensory data, market basket data etc. A common method for
analysing non-negative data is the Non-negative Matrix Factorization, though this is in theory not appropriate
for binary data, and thus we propose a novel Non-negative Matrix Factorization based on the logistic link
function. Furthermore we generalize the method to handle missing data. The formulation of the method
is compared to a previously proposed logistic matrix factorization without non-negativity constraint on the
features. We compare the performance of the Logistic Non-negative Matrix Factorization to Least Squares
Non-negative Matrix Factorization and Kullback-Leibler (KL) Non-negative Matrix Factorization on sets of
binary data: a synthetic dataset, a set of student comments on their professors collected in a binary termdocument
matrix and a sensory dataset.
We find that choosing the number of components is an essential part
in the modelling and interpretation, that is still unresolved.
(More)