Blending Topic-based Embeddings and Cosine Similarity for Open Data Discovery
Maria Franciscatto, Marcos Fabro, Luis Erpen de Bona, Celio Trois, Hegler Tissot
2022
Abstract
Source discovery aims to facilitate the search for specific information, whose access can be complex and dependent on several distributed data sources. These challenges are often observed in Open Data, where users experience lack of support and difficulty in finding what they need. In this context, Source Discovery tasks could enable the retrieval of a data source most likely to contain the desired information, facilitating Open Data access and transparency. This work presents an approach that blends Latent Dirichlet Allocation (LDA), Word2Vec, and Cosine Similarity for discovering the best open data source given a user query, supported by joint union of the methods’ semantic and syntactic capabilities. Our approach was evaluated on its ability to discover, among eight candidates, the right source for a set of queries. Three rounds of experiments were conducted, alternating the number of data sources and test questions. In all rounds, our approach showed superior results when compared with the baseline methods separately, reaching a classification accuracy above 93%, even when all candidate sources had similar content.
DownloadPaper Citation
in Harvard Style
Franciscatto M., Fabro M., Erpen de Bona L., Trois C. and Tissot H. (2022). Blending Topic-based Embeddings and Cosine Similarity for Open Data Discovery. In Proceedings of the 24th International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-989-758-569-2, pages 163-170. DOI: 10.5220/0011040700003179
in Bibtex Style
@conference{iceis22,
author={Maria Franciscatto and Marcos Fabro and Luis Erpen de Bona and Celio Trois and Hegler Tissot},
title={Blending Topic-based Embeddings and Cosine Similarity for Open Data Discovery},
booktitle={Proceedings of the 24th International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2022},
pages={163-170},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011040700003179},
isbn={978-989-758-569-2},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 24th International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - Blending Topic-based Embeddings and Cosine Similarity for Open Data Discovery
SN - 978-989-758-569-2
AU - Franciscatto M.
AU - Fabro M.
AU - Erpen de Bona L.
AU - Trois C.
AU - Tissot H.
PY - 2022
SP - 163
EP - 170
DO - 10.5220/0011040700003179