A Multimodal Approach to Research Paper Summarization

Pranav Bookanakere, Syeda Saniya, Syed Munzer Nouman, S. Pramath, Jayashree Rangareddy

2025

Abstract

As the amount of academic research in the medical field has been growing exponentially, being able to understand and extract important information from these research papers has become all the more challenging. Researchers, students, and professionals often find it hard to navigate through medical-based research papers that contain complex images and textual information. Most summarization tools that already exist have limited effectiveness and cannot handle the multimodal nature of complex research papers. This paper addresses the need for an all-round approach to effectively generate summaries, taking key information from both the text as well as the complex images present in research papers. Our approach can generate section-wise summaries of the text and also generate context-based image descriptions with high levels of accuracy. By putting together advanced Natural Language Processing (NLP) and multimodal (T5, Llava) techniques, this system is able to generate comprehensive and concise summaries of complex research papers. This work demonstrates the potential of multimodal AI models to improve research comprehension and provide deeper understanding of complex subjects in the medical field.

Download


Paper Citation


in Harvard Style

Bookanakere P., Saniya S., Nouman S., Pramath S. and Rangareddy J. (2025). A Multimodal Approach to Research Paper Summarization. In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP; ISBN 978-989-758-728-3, SciTePress, pages 950-957. DOI: 10.5220/0013277700003912


in Bibtex Style

@conference{visapp25,
author={Pranav Bookanakere and Syeda Saniya and Syed Nouman and S. Pramath and Jayashree Rangareddy},
title={A Multimodal Approach to Research Paper Summarization},
booktitle={Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP},
year={2025},
pages={950-957},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013277700003912},
isbn={978-989-758-728-3},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP
TI - A Multimodal Approach to Research Paper Summarization
SN - 978-989-758-728-3
AU - Bookanakere P.
AU - Saniya S.
AU - Nouman S.
AU - Pramath S.
AU - Rangareddy J.
PY - 2025
SP - 950
EP - 957
DO - 10.5220/0013277700003912
PB - SciTePress