Gathering and Matching Data from the Web: The Bibliographic Data Collection Case Study

Olga Cherednichenko, Lubomir Nebesky, Marián Kováč

2024

Abstract

As a result of the analysis of existing approaches to consolidating data on research activities we highlight a number of issues. Firstly, the automating the process of data collection which is included the comparing data from different sources. Secondly, the use of external services to obtain bibliographic information which is accompanied by the receipt of erroneous data. The idea of a tracking system for research activity implies that we collect and consolidate data from different web sources and keep them in order to provide relevant bibliographic information. We outline several key points to consider different spellings of the authors’ names, data duplication, and filtering out erroneously data. The purpose of the study is to improve the accuracy of comparing bibliographic data from different indexing systems. We propose the framework for gathering and matching bibliographic data from the web. The experimental results show the performance of the proposed algorithm with reaching 0.88 for the F1 metric. The software prototype is developed. The ways to improve the proposed algorithm have been identified, which opens up opportunities for further research.

Download


Paper Citation


in Harvard Style

Cherednichenko O., Nebesky L. and Kováč M. (2024). Gathering and Matching Data from the Web: The Bibliographic Data Collection Case Study. In Proceedings of the 21st International Conference on Smart Business Technologies - Volume 1: ICSBT; ISBN 978-989-758-710-8, SciTePress, pages 139-146. DOI: 10.5220/0012863500003764


in Bibtex Style

@conference{icsbt24,
author={Olga Cherednichenko and Lubomir Nebesky and Marián Kováč},
title={Gathering and Matching Data from the Web: The Bibliographic Data Collection Case Study},
booktitle={Proceedings of the 21st International Conference on Smart Business Technologies - Volume 1: ICSBT},
year={2024},
pages={139-146},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012863500003764},
isbn={978-989-758-710-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 21st International Conference on Smart Business Technologies - Volume 1: ICSBT
TI - Gathering and Matching Data from the Web: The Bibliographic Data Collection Case Study
SN - 978-989-758-710-8
AU - Cherednichenko O.
AU - Nebesky L.
AU - Kováč M.
PY - 2024
SP - 139
EP - 146
DO - 10.5220/0012863500003764
PB - SciTePress