Word and Image Embeddings in Pill Recognition

Richárd Rádli, Zsolt Vörösházi, László Czúni

2024

Abstract

Pill recognition is a key task in healthcare and has a wide range of applications. In this study, we are addressing the challenge to improve the accuracy of pill recognition in a metrics learning framework. A multi-stream visual feature extraction and processing architecture, with multi-head attention layers, is used to estimate the similarity of pills. We are introducing an essential enhancement to the triplet loss function to leverage word embeddings for the injection of textual pill similarity into the visual model. This improvement refines the visual embedding on a finer scale than conventional triplet loss models resulting in higher accuracy of the visual model. Experiments and evaluations are made on a new pill dataset, freely available.

Download


Paper Citation


in Harvard Style

Rádli R., Vörösházi Z. and Czúni L. (2024). Word and Image Embeddings in Pill Recognition. In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP; ISBN 978-989-758-679-8, SciTePress, pages 729-736. DOI: 10.5220/0012460800003660


in Bibtex Style

@conference{visapp24,
author={Richárd Rádli and Zsolt Vörösházi and László Czúni},
title={Word and Image Embeddings in Pill Recognition},
booktitle={Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP},
year={2024},
pages={729-736},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012460800003660},
isbn={978-989-758-679-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP
TI - Word and Image Embeddings in Pill Recognition
SN - 978-989-758-679-8
AU - Rádli R.
AU - Vörösházi Z.
AU - Czúni L.
PY - 2024
SP - 729
EP - 736
DO - 10.5220/0012460800003660
PB - SciTePress