Authors:
Virginia Niculescu
;
Horea Greblă
;
Adrian Sterca
and
Darius Bufnea
Affiliation:
Computer Science Department “Babeş-Bolyai” University, 1. M. Kogălniceanu, Cluj-Napoca, Romania
Keyword(s):
Data Engineering, Retrieval Systems, Recommender Systems, Research Paper Databases, Academic Sources, Big-Data Processing, NLP, Apache Spark, Graph-Databases.
Abstract:
On account of the extreme expansion of the scientific research paper databases, the usage of searching and recommender systems in this area increased, as they can help the researchers to find appropriate papers by searching in the enormous indexed datasets.
Depending on where the papers are published, there might be stricter policies that force the author to also add the needed metadata, but still there are other for which these metadata are not complete.
As a result, many of the current solutions for searching and recommending papers are usually biased to a certain database.
This paper proposes a retrieval system that can overcome these problems by aggregating data from different databases in a dynamic and efficient way. Extracting data from different sources dynamically and not only statically, based on a certain database, is important for assuring a complete interrogation, but in the same time incur complex operations that may affect the performance of the system. The performa
nce could be maintained by using carefully designed architecture that relies on tools that allow high level of parallelization.
The main original characteristic of the system is represented by the hybrid interrogation of static data (stored in databases) and dynamic data (obtained through web interrogations).
(More)