author we compare their topics with the topics of the
author's previous works. As Wu & Palmer Similarity
is a measure of the semantic closeness of two words
in a lexical context, it helps to filter the data by topic
related to the author. It is important to notice that this
solution cannot fully provide the necessary filtering,
because there is a possibility that the article is really
by the author, although it is written on a topic
completely distant from the author's main field of
Analyzing the results of the algorithm, we
identified several weaknesses that need to be
improved in the future.
In cases where an author contributes to an article
that is not related to his or her field of interest, a
problem arises when filtering out falsely attributed
works, because the algorithm we developed will
classify such articles as falsely attributed or those that
need to be clarified.
Analyzing the data of publications by different
authors, it was determined that often the same author
can have several different articles with identical or
almost identical titles. Therefore, it is necessary to
consider alternative ways of comparing publication
titles in order to avoid mistakenly combining two
different publications. Thus, further ways to improve
the system have been identified.
The research study depicted in this paper is partially
funded by the EU NextGenerationEU through the
Recovery and Resilience Plan for Slovakia under
project No. 09I03-03-V01-00078.
Arasu, A., Garcia-Molina, H. (2003). Extracting structured
data from web pages. Proceedings of the 2003 ACM
SIGMOD international conference on Management of
data. 337−348.
Bazaraa, M. S., Sherali, H. D., & Shetty, C. M. (2005).
Nonlinear Programming: Theory and Algorithms.
Nonlinear Programming: Theory and Algorithms (pp.
1–853). John Wiley and Sons.
Cherednichenko, O., Ivashchenko, O., Lincényi, M., &
Kováč, M. (2023). Information technology for
intellectual analysis of item descriptions in e-
commerce. Entrepreneurship and Sustainability Issues,
11(1), 178–190.
Correia, A., Guimaraes, D., Paulino, D., Jameel, S.,
Schneider, D., Fonseca, B., & Paredes, H. (2021).
AuthCrowd: Author Name Disambiguation and Entity
Matching using Crowdsourcing. In Proceedings of the
2021 IEEE 24th International Conference on Computer
Supported Cooperative Work in Design, CSCWD 2021
(pp. 150–155). Institute of Electrical and Electronics
Engineers Inc.
Delgado López-Cózar E., Cabezas-Clavijo, Á. (2012).
Google Scholar Metrics: an unreliable tool for assessing
scientific journals. El Profesional de la información.
Vol. 21. 4.
Elmagarmid, A. K., Panagiotis, G.I., Vassilios, S.V. (2007)
Duplicate Record Detection: A Survey. IEEE
Transactions On Knowledge And Data Engineering.
Vol.19. No.1.
Labunska, S., Cibák, Ľ., Sidak, M., & Sobakar, M. (2023).
The role of internally generated goodwill in choosing
areas and objects of investment. Investment
Management and Financial Innovations, 20(2), 215–
Levenshtein, V. I. (1965). Binary codes capable of
correcting deletions, insertions, and reversals. Soviet
physics. Doklady, 10, 707-710.
Meng, L., Huang, R., & Gu, J. (2013). A Review of
Semantic Similarity Measures in WordNet.
International Journal of Hybrid Information
Technology, 6(1), 1–12.
Nurjahan, V. A., & Jancy, S. (2023). Dual Cloud
Bibliographic Network Model for Citation
Recommendation Systems. In 2023 Annual
International Conference on Emerging Research
Areas: International Conference on Intelligent Systems,
AICERA/ICIS 2023. Institute of Electrical and
Electronics Engineers Inc.
Orduna-Malea, E., Martín Martín, A., Delgado Lopez-
Cozar, E. (2017). Google Scholar as a source for
scholarly evaluation: A bibliographic review of
database errors. Revista Española de Documentación
Científica, Vol. 40. 4.
Pratama, M. A., & Mandala, R. (2022). Improving Query
Expansion Performances with Pseudo Relevance
Feedback and Wu-Palmer Similarity on Cross
Language Information Retrieval. In 2022 9th
International Conference on Advanced Informatics:
Concepts, Theory and Applications, ICAICTA 2022.
Institute of Electrical and Electronics Engineers Inc.
Ristad, E.S., Yianilos, P.N. (1998) Learning String Edit
Distance. IEEE Trans. Pattern Analysis and Machine
Intelligence. Vol. 20. 5. 522-532.
Smith, T. F. , Waterman, M.S. (1981) Identification of
Common Molecular Subsequences. J. Molecular
Biology. Vol. 147. 195- 197.
Stryzhak, O., Cibák, L., Sidak, M., & Yermachenko, V.
(2024). Socio-economic development of tourist
destinations: A cross-country analysis. Journal of
Eastern European and Central Asian Research
(JEECAR), 11(1), 79–96.