Authors:
            
                    Takeru Yokoi
                    
                        
                                1
                            
                    
                     and
                
                    Hidekazu Yanagimoto
                    
                        
                                2
                            
                    
                    
                
        
        
            Affiliations:
            
                    
                        
                                1
                            
                    
                    Tokyo Metropolitan College of Industrial Technology, Japan
                
                    ; 
                
                    
                        
                                2
                            
                    
                    Osaka Prefecture University, Japan
                
        
        
        
        
        
             Keyword(s):
            Topic extraction, Sparse non-negative matrix factorization, Clustering.
        
        
            
                Related
                    Ontology
                    Subjects/Areas/Topics:
                
                        Artificial Intelligence
                    ; 
                        Knowledge Discovery and Information Retrieval
                    ; 
                        Knowledge-Based Systems
                    ; 
                        Soft Computing
                    ; 
                        Symbolic Systems
                    ; 
                        Web Mining
                    
            
        
        
            
                Abstract: 
                We propose here a method to extract topics from a large document set with the topics included in its divisions and the combination of them. In order to extract topics, the Sparse Non-negative Matrix Factorization that imposes sparse constrain only to a basis matrix, which we call SNMF/L, is applied to document sets. It is useful to combine the topics from some small document sets since if the number of documents is large, the procedure of topic extraction with the SNMF/L from a large corpus takes a long time. In this paper, we have shortened the procedure time for the topic extraction from a large document set with the combining topics that are extracted from respective divided document set. In addition, an evaluation of our proposed method has been carried out with the corresponding topics between the combined topics and the topics from the large document set by the SNMF/L directly, and the procedure times of the SNMF/L.