Authors:
Nuo Zhang
;
Daisuke Matsuzaki
;
Toshinori Watanabe
and
Hisashi Koga
Affiliation:
Graduate School of Information Systems, The University of Electro-Communications, Japan
Keyword(s):
Document analysis, PRDC, Topic extraction, Relation analysis, Clustering, Data compression.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Biomedical Engineering
;
Biomedical Signal Processing
;
Data Manipulation
;
Data Mining
;
Databases and Information Systems Integration
;
Enterprise Information Systems
;
Health Engineering and Technology Applications
;
Human-Computer Interaction
;
Knowledge Representation and Reasoning
;
Methodologies and Methods
;
Neurocomputing
;
Neurotechnology, Electronics and Informatics
;
Pattern Recognition
;
Physiological Computing Systems
;
Sensor Networks
;
Signal Processing
;
Soft Computing
;
Symbolic Systems
Abstract:
Nowadays, there are a great deal of e-documents can be easily accessed. It will be beneficial if a method can evaluate documents and abstract significant content. Similarity analysis and topic extraction are widely used as document relation analysis techniques. Most of the methods are based on dictionary-base morphological analysis. They cannot meet the requirement when the Internet grows fast and new terms appear but dictionary cannot be automatically updated fast enough. In this study, we propose a novel document relation analysis (topic extraction) method based on a compressibility vector. Our proposal does not require morphological analysis, and it can automatically evaluate input documents. We will examine the proposal with using model document and Reuters-21578 dataset, for relation analysis and topic extraction. The effectiveness of the proposed method will be shown in simulations.