A Methodology to Measure the Semantic Similarity between Words based on the Formal Concept Analysis
Yewon Jeong, Yiyeon Yoon, Dongkyu Jeon, Youngsang Cho, Wooju Kim
2014
Abstract
Recently, web users feel difficult to find the desired information on the internet despite a lot of useful information since it takes more time and effort to find it. In order to solve this problem, the query expansion is considered as a new alternative. It is the process of reformulating a query to improve retrieval performance in information retrieval operations. Although there are a few techniques of query expansion, synonym identification is one of them. Therefore, this paper proposes the method to measure the semantic similarity between two words by using the keyword-based web documents. The formal concept analysis and our proposed expansion algorithm are used to estimate the similarity between two words. To evaluate the performance of our method, we conducted two experiments. As the results, the average of similarity between synonym pairs is much higher than random pairs. Also, our method shows the remarkable performance in comparison with other method. Therefore, the suggested method in this paper has the contribution to find the synonym among a lot of candidate words.
References
- Alqadah, F. & Bhatnagar, R. 2011. Similarity Measures In Formal Concept Analysis. Annals Of Mathematics And Artificial Intelligence, 61, 245-256.
- Baroni, M. & Bisi, S. Using Cooccurrence Statistics And The Web To Discover Synonyms In A Technical Language. Lrec, 2004.
- Blondel, V. D. & Senellart, P. P. 2011. Automatic Extraction Of Synonyms In A Dictionary. Vertex, 1, X1.
- Cole, R. & Eklund, P. W. 1999. Scalability In Formal Concept Analysis. Computational Intelligence, 15, 11- 27.
- Curran, J. R. & Moens, M. Improvements In Automatic Thesaurus Extraction. Proceedings Of The Acl-02 Workshop On Unsupervised Lexical AcquisitionVolume 9, 2002. Association For Computational Linguistics, 59-66.
- Ganter, B., Stumme, G. & Wille, R. 2005. Formal Concept Analysis: Foundations And Applications, Springer.
- Ganter, B., Wille, R. & Franzke, C. 1997. Formal Concept Analysis: Mathematical Foundations, Springer-Verlag New York, Inc.
- Ho, N.-D. & Fairon, C. Lexical Similarity Based On Quantity Of Information Exchanged-Synonym Extraction. Rivf, 2004. Citeseer, 193-198.
- Landauer, T. K. & Dumais, S. T. 1997. A Solution To Plato's Problem: The Latent Semantic Analysis Theory Of Acquisition, Induction, And Representation Of Knowledge. Psychological Review, 104, 211.
- Lin, D. Automatic Retrieval And Clustering Of Similar Words. Proceedings Of The 17th International Conference On Computational Linguistics-Volume 2, 1998. Association For Computational Linguistics, 768-774.
- Lin, D., Zhao, S., Qin, L. & Zhou, M. Identifying Synonyms Among Distributionally Similar Words. Ijcai, 2003. 1492-1493.
- Lu, Z., Liu, Y., Zhao, S. & Chen, X. Study On Feature Selection And Weighting Based On Synonym Merge In Text Categorization. Future Networks, 2010. Icfn'10. Second International Conference On, 2010. Ieee, 105-109.
- S Nchez, D. & Moreno, A. Automatic Discovery Of Synonyms And Lexicalizations From The Web. Ccia, 2005. 205-212.
- Senellart, P. & Blondel, V. D. 2008. Automatic Discovery Of Similarwords. Survey Of Text Mining Ii. Springer.
- Tam, G. K. Focas-Formal Concept Analysis And Text Similarity. Proceedings Of The 2nd International Conference On Formal Concept Analysis, 2004.
- Turney, P. 2001. Mining The Web For Synonyms: Pmi-Ir Versus Lsa On Toefl.
- Van Der Plas, L. & Tiedemann, J. Finding Synonyms Using Automatic Word Alignment And Measures Of Distributional Similarity. Proceedings Of The Coling/Acl On Main Conference Poster Sessions, 2006. Association For Computational Linguistics, 866-873.
- Vechtomova, O. & Wang, Y. 2006. A Study Of The Effect Of Term Proximity On Query Expansion. Journal Of Information Science, 32, 324-333.
- Veronis, J. & Ide, N. M. Word Sense Disambiguation With Very Large Neural Networks Extracted From Machine Readable Dictionaries. Proceedings Of The 13th Conference On Computational LinguisticsVolume 2, 1990. Association For Computational Linguistics, 389-394.
- Vickrey, D., Kipersztok, O. & Koller, D. An Active Learning Approach To Finding Related Terms. Proceedings Of The Acl 2010 Conference Short Papers, 2010. Association For Computational Linguistics, 371-376.
- Wille, R. 2009. Restructuring Lattice Theory: An Approach Based On Hierarchies Of Concepts, Springer.
- Wormuth, B. & Becker, P. Introduction To Formal Concept Analysis. 2nd International Conference Of Formal Concept Analysis February, 2004.
- Wu, H. & Zhou, M. Optimizing Synonym Extraction Using Monolingual And Bilingual Resources. Proceedings Of The Second International Workshop On Paraphrasing-Volume 16, 2003. Association For Computational Linguistics, 72-79.
Paper Citation
in Harvard Style
Jeong Y., Yoon Y., Jeon D., Cho Y. and Kim W. (2014). A Methodology to Measure the Semantic Similarity between Words based on the Formal Concept Analysis . In Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 2: WEBIST, ISBN 978-989-758-024-6, pages 313-321. DOI: 10.5220/0004855603130321
in Bibtex Style
@conference{webist14,
author={Yewon Jeong and Yiyeon Yoon and Dongkyu Jeon and Youngsang Cho and Wooju Kim},
title={A Methodology to Measure the Semantic Similarity between Words based on the Formal Concept Analysis},
booktitle={Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,},
year={2014},
pages={313-321},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004855603130321},
isbn={978-989-758-024-6},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 2: WEBIST,
TI - A Methodology to Measure the Semantic Similarity between Words based on the Formal Concept Analysis
SN - 978-989-758-024-6
AU - Jeong Y.
AU - Yoon Y.
AU - Jeon D.
AU - Cho Y.
AU - Kim W.
PY - 2014
SP - 313
EP - 321
DO - 10.5220/0004855603130321