Comparison between LSA-LDA-Lexical Chains
Costin Chiru, Traian Rebedea, Silvia Ciotec
2014
Abstract
This paper presents an analysis of three techniques used for similar tasks, especially related to semantics, in Natural Language Processing (NLP): Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA) and lexical chains. These techniques were evaluated and compared on two different corpora in order to highlight the similarities and differences between them from a semantic analysis viewpoint. The first corpus consisted of four Wikipedia articles on different topics, while the second one consisted of 35 online chat conversations between 4-12 participants debating four imposed topics (forum, chat, blog and wikis). The study focuses on finding similarities and differences between the outcomes of the three methods from a semantic analysis point of view, by computing quantitative factors such as correlations, degree of coverage of the resulting topics, etc. Using corpora from different types of discourse and quantitative factors that are task-independent allows us to prove that although LSA and LDA provide similar results, the results of lexical chaining are not very correlated with neither the ones of LSA or LDA, therefore lexical chains might be used complementary to LSA or LDA when performing semantic analysis for various NLP applications.
References
- Barzilay, R. and Elhadad. M., 1997. Using lexical chains for text summarization. In: Proceedings of the Intelligent Scalable Text Summarization Workshop, pp. 10-17.
- Budanitsky, A. and Hirst, G., 2006. Evaluating wordnetbased measures of semantic relatedness. In: Computational Linguistics 32 (1), pp. 13-47.
- Blei, D. M., Ng, A. Y. and Jordan, M. I., 2003. Latent Dirichlet allocation. In: Journal of Machine Learning Research 3, pp. 993-1022.
- Boyd-Graber, J., Fellbaum, C., Osherson, D. and Schapire, R., 2006. Adding dense, weighted, connections to WordNet. In: Proceedings of the 3rd GlobalWordNet Meeting, pp. 29-35.
- Carthy, J., 2004. Lexical chains versus keywords for topic tracking. In: Computational Linguistics and Intelligent Text Processing, LNCS, pp. 507-510. Springer.
- Chiru, C., Janca, A., Rebedea, T., 2010. Disambiguation and Lexical Chains Construction Using WordNet. In S. Trausan-Matu, P.Dessus (Eds.) Natural Language Processing in Support of Learning: Metrics, Feedback and Connectivity, MatrixRom, pp 65-71.
- Cramer, I., 2008. How well do semantic relatedness measures perform? a meta-study. In: Proceedings of the Symposium on Semantics in Systems for Text Processing.
- Griffiths, T. L., Steyvers, M. and Tenenbaum, J. B., 2007. Topics in semantic representation. In: Psychological Review, vol. 114, no. 2, pp. 211-244.
- Gong, Y. and Liu, X., 2001. Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis. In: Proceedings of the 24th ACM SIGIR conference, pp. 19-25.
- Haghighi, A. and Vanderwende, L., 2009. Exploring content models for multi-document summarization. In: Proceedings of HLT-NAACL, pp. 362-370.
- Halliday, M. A.K. and Hasan, R., 1976. Cohesion In English, Longman.
- Jiang, J. J. and Conrath, D. W, 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of ROCLING X, pp. 19-33.
- Kakkonen, T., Myller, N., Sutinen, E.and Timonen, J., 2008. Comparison of Dimension Reduction Methods for Automated Essay Grading. In: Educational Technology & Society, 11(3), pp. 275-288.
- Landauer, T. K. and Dumais, S. T., 1997. A solution to Plato's problem: the Latent Semantic Analysis theory of acquisition, induction and representation of knowledge. Psychological Review, 104(2), 211-240.
- Misra, H., Yvon, F., Jose, J. and Cappé, O., 2009. Text Segmentation via Topic Modeling: An Analytical Study. In: 18th ACM Conference on Information and Knowledge Management, pp. 1553-1556.
- Morris, J. and Hirst, G., 1991. Lexical Cohesion, the Thesaurus, and the Structure of Text. In: Computational Linguistics, Vol 17(1), pp. 211-232.
- Novischi, A. and Moldovan, D., 2006. Question answering with lexical chains propagating verb arguments. In: Proceedings of the 21st International Conference on CL and 44th Annual Meeting of ACL, pp. 897-904.
- Tsatsaronis, G., Varlamis, I. and Vazirgiannis, M., 2010. Text relatedness based on a word thesaurus. In: Artificial Intelligence Research, 37, pp. 1-39.
Paper Citation
in Harvard Style
Chiru C., Rebedea T. and Ciotec S. (2014). Comparison between LSA-LDA-Lexical Chains . In Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 2: WEBIST, ISBN 978-989-758-024-6, pages 255-262. DOI: 10.5220/0004798102550262
in Bibtex Style
@conference{webist14,
author={Costin Chiru and Traian Rebedea and Silvia Ciotec},
title={Comparison between LSA-LDA-Lexical Chains},
booktitle={Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,},
year={2014},
pages={255-262},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004798102550262},
isbn={978-989-758-024-6},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,
TI - Comparison between LSA-LDA-Lexical Chains
SN - 978-989-758-024-6
AU - Chiru C.
AU - Rebedea T.
AU - Ciotec S.
PY - 2014
SP - 255
EP - 262
DO - 10.5220/0004798102550262