Figure 7: First line: A query image and the corresponding
wrongly recognized item from the OGYEI dataset. The pill
on the right has similar, but not identical, printed text on
the invisible side. Second line: The same kind of examples
from the CURE dataset.
ACKNOWLEDGEMENTS
This work has been partly supported by the
2020-1.1.2-PIACI-KFI-2021-00296 project of the
National Research, Development and Innovation
Fund. We also acknowledge the financial support
of the Hungarian Scientific Research Fund grant
OTKA K-135729. We are grateful to the NVIDIA
corporation for supporting our research with GPUs
obtained by the NVIDIA Hardware Grant Program.
REFERENCES
Busta, M., Neumann, L., and Matas, J. (2017).
Deep textspotter: An end-to-end trainable scene
text localization and recognition framework. In
Proceedings of the IEEE International Conference on
Computer Vision, pages 2204–2212.
Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., , and
Vedaldi, A. (2014). Describing textures in the wild.
In Proceedings of the IEEE Conf. on Computer Vision
and Pattern Recognition (CVPR).
Cronenwett, L. R., Bootman, J. L., Wolcott, J., Aspden, P.,
et al. (2007). Preventing medication errors. National
Academies Press.
Hassan, T. and Khan, H. A. (2015). Handwritten bangla
numeral recognition using local binary pattern.
In 2015 International Conference on Electrical
Engineering and Information Communication
Technology (ICEEICT), pages 1–4. IEEE.
Heo, J., Kang, Y., Lee, S., Jeong, D.-H., and Kim, K.-M.
(2023). An accurate deep learning–based system for
automatic pill identification: Model development and
validation. J. Med. Internet Res., 25:e41043.
Ling, S., Pastor, A., Li, J., Che, Z., Wang, J., Kim, J.,
and Callet, P. L. (2020). Few-shot pill recognition.
In Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, pages
9789–9798.
Liu, L., Zhang, H., Feng, A., Wan, X., and Guo, J.
(2010). Simplified local binary pattern descriptor
for character recognition of vehicle license plate. In
2010 Seventh International Conference on Computer
Graphics, Imaging and Visualization, pages 157–161.
IEEE.
Nguyen, A. D., Nguyen, T. D., Pham, H. H., Nguyen, T. H.,
and Nguyen, P. L. (2022). Image-based contextual pill
recognition with medical knowledge graph assistance.
In Asian Conference on Intelligent Information and
Database Systems, pages 354–369. Springer.
Ojala, T., Pietikainen, M., and Harwood, D. (1994).
Performance evaluation of texture measures with
classification based on Kullback discrimination of
distributions. In Proceedings of 12th International
Conference on Pattern Recognition, volume 1, pages
582–585. IEEE.
Ronneberger, O., Fischer, P., and Brox, T. (2015).
U-Net: Convolutional networks for biomedical image
segmentation. In Medical Image Computing and
Computer-Assisted Intervention–MICCAI 2015: 18th
International Conference, Munich, Germany, October
5-9, 2015, Proceedings, Part III 18, pages 234–241.
Springer.
Schroff, F., Kalenichenko, D., and Philbin, J. (2015).
Facenet: A unified embedding for face recognition
and clustering. In Proceedings of the IEEE
Conference on Computer Vision and Pattern
Recognition, pages 815–823.
Tan, L., Huangfu, T., Wu, L., and Chen, W. (2021).
Comparison of RetinaNet, SSD, and YOLOv3 for
real-time pill identification. BMC Medical Informatics
and Decision Making, 21:1–11.
Tan, M. and Le, Q. (2019). EfficientNet: Rethinking
model scaling for convolutional neural networks.
In International Conference on Machine Learning,
pages 6105–6114. PMLR.
Tan, M. and Le, Q. (2021). EfficientNetv2: Smaller models
and faster training. In International Conference on
Machine Learning, pages 10096–10106. PMLR.
VAIPE (2008). VAIPE-Pill: A Large-scale, Annotated
Benchmark Dataset for Visual Pill Identification.
https://vaipe.org/. [Online; accessed 1-July-2023].
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I.
(2017). Attention is all you need. Advances in Neural
Information Processing Systems, 30.
Wang, C.-Y., Bochkovskiy, A., and Liao, H.-Y. M.
(2023). YOLOv7: Trainable bag-of-freebies sets
Pill Metrics Learning with Multihead Attention
139