Food Data Integration by using Heuristics based on Lexical and Semantic Similarities

Gorjan Popovski; Gorjan Popovski; Gordana Ispirova; Gordana Ispirova; Nina Hadzi-Kotarova; Eva Valenčič; Eva Valenčič; Eva Valenčič; Tome Eftimov; Barbara Koroušić Seljak

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Food Data Integration by using Heuristics based on Lexical and Semantic Similarities

Topics: Data Mining and Data Analysis; Interoperability and Data Integration; Pattern Recognition and Machine Learning

In Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 5: BIOSTEC, 208-216, 2020 , Valletta, Malta

Authors: Gorjan Popovski ^{1

;

2} ; Gordana Ispirova ^{1

;

2} ; Nina Hadzi-Kotarova ³ ; Eva Valenčič ^{1

;

2

;

4} ; Tome Eftimov ¹ and Barbara Koroušić Seljak ¹

Affiliations: ¹ Computer Systems Department, Jožef Stefan Institute, 1000 Ljubljana, Slovenia ; ² Jožef Stefan International Postgraduate School, 1000 Ljubljana, Slovenia ; ³ Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, 1000 Skopje, North Macedonia ; ⁴ School of Health Sciences, Faculty of Health and Medicine, Priority Research Centre in Physical Activity and Nutrition, The University of Newcastle, Callaghan, Australia

Keyword(s): Data Normalization, Food Data Integration, Lexical Similarity, Semantic Similarity, Word Embeddings.

Abstract: With the rapidly growing food supply in the last decade, vast amounts of food-related data have been collected. To make this data inter-operable and equipped for analyses involving studying relations between food, as one of the main environmental and health outcomes, data coming from various data sources needs to be normalized. Food data can have varying sources and formats (food composition, food consumption, recipe data), yet the most familiar type is food product data, often misinterpreted due to marketing strategies of different producers and retailers. Several recent studies have addressed the problem of heterogeneous data by matching food products using lexical similarity between their English names. In this study, we address this problem, while considering a non-English, low researched language in terms of natural language processing, i.e. Slovenian. To match food products, we use our previously developed heuristic based on lexical similarity and propose two new semantic simil arity heuristics based on word embeddings. The proposed heuristics are evaluated using a dataset with 438 ground truth pairs of food products, obtained by matching their EAN barcodes. Preliminary results show that the lexical similarity heuristic provides more promising results (75% accuracy), while the best semantic similarity model yields an accuracy of 62%. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 3.145.45.205

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Popovski, G., Ispirova, G., Hadzi-Kotarova, N., Valenčič, E., Eftimov, T. and Seljak, B. K. (2020). Food Data Integration by using Heuristics based on Lexical and Semantic Similarities. In Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2020) - HEALTHINF; ISBN 978-989-758-398-8; ISSN 2184-4305, SciTePress, pages 208-216. DOI: 10.5220/0008990602080216

@conference{healthinf20,
author={Gorjan Popovski and Gordana Ispirova and Nina Hadzi{-}Kotarova and Eva Valenčič and Tome Eftimov and Barbara Koroušić Seljak},
title={Food Data Integration by using Heuristics based on Lexical and Semantic Similarities},
booktitle={Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2020) - HEALTHINF},
year={2020},
pages={208-216},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0008990602080216},
isbn={978-989-758-398-8},
issn={2184-4305},
}

TY - CONF

JO - Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2020) - HEALTHINF
TI - Food Data Integration by using Heuristics based on Lexical and Semantic Similarities
SN - 978-989-758-398-8
IS - 2184-4305
AU - Popovski, G.
AU - Ispirova, G.
AU - Hadzi-Kotarova, N.
AU - Valenčič, E.
AU - Eftimov, T.
AU - Seljak, B.
PY - 2020
SP - 208
EP - 216
DO - 10.5220/0008990602080216
PB - SciTePress