Authors:
Costin Chiru
;
Traian Rebedea
and
Silvia Ciotec
Affiliation:
University Politehnica Bucharest, Romania
Keyword(s):
Latent Semantic Analysis - LSA, Latent Dirichlet Allocation - LDA, Lexical Chains, Semantic Relatedness.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Data Mining
;
Databases and Information Systems Integration
;
Enterprise Information Systems
;
Sensor Networks
;
Signal Processing
;
Soft Computing
Abstract:
This paper presents an analysis of three techniques used for similar tasks, especially related to semantics, in
Natural Language Processing (NLP): Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA)
and lexical chains. These techniques were evaluated and compared on two different corpora in order to
highlight the similarities and differences between them from a semantic analysis viewpoint. The first corpus
consisted of four Wikipedia articles on different topics, while the second one consisted of 35 online chat
conversations between 4-12 participants debating four imposed topics (forum, chat, blog and wikis). The
study focuses on finding similarities and differences between the outcomes of the three methods from a
semantic analysis point of view, by computing quantitative factors such as correlations, degree of coverage
of the resulting topics, etc. Using corpora from different types of discourse and quantitative factors that are
task-independent allows us to prove that
although LSA and LDA provide similar results, the results of
lexical chaining are not very correlated with neither the ones of LSA or LDA, therefore lexical chains might
be used complementary to LSA or LDA when performing semantic analysis for various NLP applications.
(More)