loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Takeru Yokoi 1 and Hidekazu Yanagimoto 2

Affiliations: 1 Tokyo Metropolitan College of Industrial Technology, Japan ; 2 Osaka Prefecture University, Japan

Keyword(s): Topic extraction, Sparse non-negative matrix factorization, Clustering.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Knowledge Discovery and Information Retrieval ; Knowledge-Based Systems ; Soft Computing ; Symbolic Systems ; Web Mining

Abstract: We propose here a method to extract topics from a large document set with the topics included in its divisions and the combination of them. In order to extract topics, the Sparse Non-negative Matrix Factorization that imposes sparse constrain only to a basis matrix, which we call SNMF/L, is applied to document sets. It is useful to combine the topics from some small document sets since if the number of documents is large, the procedure of topic extraction with the SNMF/L from a large corpus takes a long time. In this paper, we have shortened the procedure time for the topic extraction from a large document set with the combining topics that are extracted from respective divided document set. In addition, an evaluation of our proposed method has been carried out with the corresponding topics between the combined topics and the topics from the large document set by the SNMF/L directly, and the procedure times of the SNMF/L.

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.141

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Yokoi, T. and Yanagimoto, H. (2009). TOPIC EXTRACTION FROM DIVIDED DOCUMENT SETS. In Proceedings of the Fifth International Conference on Web Information Systems and Technologies - WEBIST; ISBN 978-989-8111-81-4; ISSN 2184-3252, SciTePress, pages 654-659. DOI: 10.5220/0001822106540659

@conference{webist09,
author={Takeru Yokoi and Hidekazu Yanagimoto},
title={TOPIC EXTRACTION FROM DIVIDED DOCUMENT SETS},
booktitle={Proceedings of the Fifth International Conference on Web Information Systems and Technologies - WEBIST},
year={2009},
pages={654-659},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001822106540659},
isbn={978-989-8111-81-4},
issn={2184-3252},
}

TY - CONF

JO - Proceedings of the Fifth International Conference on Web Information Systems and Technologies - WEBIST
TI - TOPIC EXTRACTION FROM DIVIDED DOCUMENT SETS
SN - 978-989-8111-81-4
IS - 2184-3252
AU - Yokoi, T.
AU - Yanagimoto, H.
PY - 2009
SP - 654
EP - 659
DO - 10.5220/0001822106540659
PB - SciTePress