CAMMA: A Deep Learning-Based Approach for Cascaded Multi-Task Medical Vision Question Answering

Teodora-Alexandra Toader, Alexandru Manole, Gabriela Czibula

2025

Abstract

Medical Visual Question Answering is a multi-modal problem which combines visual and language information to address medical inquiries, offering potential benefits in computer-aided diagnosis and medical education. Deep Learning has proven effective in this area, however the scarcity of data remains an issue for this data-hungry approach. To tackle this, we propose CAMMA, a cascaded multi-task architecture for Medical Visual Question Answering, achieving state-of-the-art results on the OVQA dataset with 71.45% accuracy. The model has all the advantages of a multi-task network, reducing overfitting and increasing data efficiency by capitalizing on the additional output information for each input sample. To test the adaptability of our model, we apply the same method on the VQA-Med 2019 dataset. We experiment with the choice of objectives included in the multi-task framework and the weighting between them.

Download


Paper Citation


in Harvard Style

Toader T., Manole A. and Czibula G. (2025). CAMMA: A Deep Learning-Based Approach for Cascaded Multi-Task Medical Vision Question Answering. In Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART; ISBN 978-989-758-737-5, SciTePress, pages 193-200. DOI: 10.5220/0013109100003890


in Bibtex Style

@conference{icaart25,
author={Teodora-Alexandra Toader and Alexandru Manole and Gabriela Czibula},
title={CAMMA: A Deep Learning-Based Approach for Cascaded Multi-Task Medical Vision Question Answering},
booktitle={Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART},
year={2025},
pages={193-200},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013109100003890},
isbn={978-989-758-737-5},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART
TI - CAMMA: A Deep Learning-Based Approach for Cascaded Multi-Task Medical Vision Question Answering
SN - 978-989-758-737-5
AU - Toader T.
AU - Manole A.
AU - Czibula G.
PY - 2025
SP - 193
EP - 200
DO - 10.5220/0013109100003890
PB - SciTePress