Blending Topic-based Embeddings and Cosine Similarity for Open Data Discovery

Maria Helena Franciscatto; Marcos Didonet Del Fabro; Luis Carlos Erpen de Bona; Celio Trois; Hegler Tissot

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Blending Topic-based Embeddings and Cosine Similarity for Open Data Discovery

Topics: Big Data; Data Mining; Knowledge Management; Problem Solving

In Proceedings of the 24th International Conference on Enterprise Information Systems - Volume 1: ICEIS, 163-170, 2022

Authors: Maria Helena Franciscatto ¹ ; Marcos Didonet Del Fabro ¹ ; Luis Carlos Erpen de Bona ¹ ; Celio Trois ² and Hegler Tissot ³

Affiliations: ¹ Department of Informatics, Federal University of Paraná, Curitiba, Brazil ; ² Technology Center, Federal University of Santa Maria, Santa Maria, Brazil ; ³ Drexel University, Philadelphia, U.S.A.

Keyword(s): Source Discovery, Open Data, LDA, Word2Vec, Cosine Similarity, Machine Learning.

Abstract: Source discovery aims to facilitate the search for specific information, whose access can be complex and dependent on several distributed data sources. These challenges are often observed in Open Data, where users experience lack of support and difficulty in finding what they need. In this context, Source Discovery tasks could enable the retrieval of a data source most likely to contain the desired information, facilitating Open Data access and transparency. This work presents an approach that blends Latent Dirichlet Allocation (LDA), Word2Vec, and Cosine Similarity for discovering the best open data source given a user query, supported by joint union of the methods’ semantic and syntactic capabilities. Our approach was evaluated on its ability to discover, among eight candidates, the right source for a set of queries. Three rounds of experiments were conducted, alternating the number of data sources and test questions. In all rounds, our approach showed superior results when compare d with the baseline methods separately, reaching a classification accuracy above 93%, even when all candidate sources had similar content. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 18.218.241.211

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Franciscatto, M. H., Fabro, M. D., Erpen de Bona, L. C., Trois, C. and Tissot, H. (2022). Blending Topic-based Embeddings and Cosine Similarity for Open Data Discovery. In Proceedings of the 24th International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-758-569-2; ISSN 2184-4992, SciTePress, pages 163-170. DOI: 10.5220/0011040700003179

@conference{iceis22,
author={Maria Helena Franciscatto and Marcos Didonet Del Fabro and Luis Carlos {Erpen de Bona} and Celio Trois and Hegler Tissot},
title={Blending Topic-based Embeddings and Cosine Similarity for Open Data Discovery},
booktitle={Proceedings of the 24th International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2022},
pages={163-170},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011040700003179},
isbn={978-989-758-569-2},
issn={2184-4992},
}

TY - CONF

JO - Proceedings of the 24th International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - Blending Topic-based Embeddings and Cosine Similarity for Open Data Discovery
SN - 978-989-758-569-2
IS - 2184-4992
AU - Franciscatto, M.
AU - Fabro, M.
AU - Erpen de Bona, L.
AU - Trois, C.
AU - Tissot, H.
PY - 2022
SP - 163
EP - 170
DO - 10.5220/0011040700003179
PB - SciTePress