Authors:
Takeru Yokoi
1
and
Hidekazu Yanagimoto
2
Affiliations:
1
Tokyo Metropolitan College of Industrial Technology, Japan
;
2
Osaka Prefecture University, Japan
Keyword(s):
Topic extraction, Sparse non-negative matrix factorization, Clustering.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Soft Computing
;
Symbolic Systems
;
Web Mining
Abstract:
We propose here a method to extract topics from a large document set with the topics included in its divisions and the combination of them. In order to extract topics, the Sparse Non-negative Matrix Factorization that imposes sparse constrain only to a basis matrix, which we call SNMF/L, is applied to document sets. It is useful to combine the topics from some small document sets since if the number of documents is large, the procedure of topic extraction with the SNMF/L from a large corpus takes a long time. In this paper, we have shortened the procedure time for the topic extraction from a large document set with the combining topics that are extracted from respective divided document set. In addition, an evaluation of our proposed method has been carried out with the corresponding topics between the combined topics and the topics from the large document set by the SNMF/L directly, and the procedure times of the SNMF/L.