Visual Question Answering Analysis: Datasets, Methods, and Image Featurization Techniques

Vijay Kumari; Abhimanyu Sethi; Yashvardhan Sharma; Lavika Goel

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Visual Question Answering Analysis: Datasets, Methods, and Image Featurization Techniques

Topics: Deep Learning and Neural Networks; Feature Selection and Extraction; Image and Video Analysis and Understanding; Machine Learning Methods; Natural Language Processing

In Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods ICPRAM - Volume 1, 281-288, 2023 , Lisbon, Portugal

Authors: Vijay Kumari ¹ ; Abhimanyu Sethi ¹ ; Yashvardhan Sharma ¹ and Lavika Goel ²

Affiliations: ¹ Birla Institute of Technology and Science, Pilani, Rajasthan, India ; ² Malaviya National Institute of Technology, Jaipur, Rajasthan, India

Keyword(s): Computer Vision, Natural Language Processing (NLP), Visual Question Answering (VQA), Attention Mechanism, Convolutional Neural Networks.

Abstract: Holistic scene understanding is a long-standing objective of core tenets of Artificial Intelligence (AI). Multimodal tasks that aim to synergize capabilities spanning multiple domains, such as visual-linguistic capabilities, into intelligent systems are thus a desideratum for the next step in AI. Visual Question Answering (VQA) systems that integrate Computer Vision and Natural Language Processing tasks into the task of answering natural language questions about an image represent one such domain. There is a need to explore Deep Learning techniques that can help to improve such systems beyond the language biases of real-world priors that presently hinder them from serving as a veritable touchstone for holistic scene understanding. Furthermore, the effectiveness of Transformer architecture for the image featurization pipeline of VQA systems remains untested. Hence, an exhaustive study on the performance of various model architectures with varied training conditions on VQA datasets lik e VizWiz and VQA v2 is imperative to further this area of research. This study explores architectures that utilize image and question co-attention for the task of VQA and several CNN architectures, including ResNet, VGG, EfficientNet, and DenseNet. Vision Transformer architecture is also explored for image featurization, and a myriad of loss functions such as cross-entropy, focal loss, and UniLoss are employed for training the models. Finally, the trained model is deployed using Flask, and a GUI for the same has been implemented that lets users input an image and accompanying questions about the image to generate an answer in response. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 3.17.76.159

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Kumari, V., Sethi, A., Sharma, Y. and Goel, L. (2023). Visual Question Answering Analysis: Datasets, Methods, and Image Featurization Techniques. In Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods - ICPRAM; ISBN 978-989-758-626-2; ISSN 2184-4313, SciTePress, pages 281-288. DOI: 10.5220/0011655900003411

@conference{icpram23,
author={Vijay Kumari and Abhimanyu Sethi and Yashvardhan Sharma and Lavika Goel},
title={Visual Question Answering Analysis: Datasets, Methods, and Image Featurization Techniques},
booktitle={Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods - ICPRAM},
year={2023},
pages={281-288},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011655900003411},
isbn={978-989-758-626-2},
issn={2184-4313},
}

TY - CONF

JO - Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods - ICPRAM
TI - Visual Question Answering Analysis: Datasets, Methods, and Image Featurization Techniques
SN - 978-989-758-626-2
IS - 2184-4313
AU - Kumari, V.
AU - Sethi, A.
AU - Sharma, Y.
AU - Goel, L.
PY - 2023
SP - 281
EP - 288
DO - 10.5220/0011655900003411
PB - SciTePress