
is the process that converts an image of text into a
machine-readable text format.
4. CNN - A Convolutional Neural Network
(CNN) is a type of deep learning algorithm that is par-
ticularly well-suited for image recognition and pro-
cessing tasks. It is made up of multiple layers, in-
cluding convolutional layers, pooling layers, and fully
connected layers.
5. RNN - A recurrent neural network or RNN is
a deep neural network trained on sequential or time
series data to create a machine learning (ML) model
that can make sequential predictions or conclusions
based on sequential inputs.
6. FNN - A feedforward neural network is a type
of neural network where information flows in one di-
rection from the input to the output layers, without
cycles or loops.
7. LSTM - LSTM (Long Short-Term Memory) is
a recurrent neural network (RNN) architecture widely
used in Deep Learning. It excels at capturing long-
term dependencies.
8. ROUGE - ROUGE (Recall-Oriented Under-
study for Gisting Evaluation), is a set of metrics and a
software package specifically designed for evaluating
automatic summarization, but that can be also used
for machine translation.
9. BLEU - The acronym BLEU refers to a “Bilin-
gual Evaluation Understudy”, and it’s a statistic for
measuring the accuracy of machine translations com-
pared to human translators.
REFERENCES
Alomari, A., Idris, N., Sabri, A., and Alsmadi, I. (2021).
Deep reinforcement and transfer learning for abstrac-
tive text summarization: A review.
Chen, C., Zhang, R., Koh, E., Sungchul, Kim, S. C., Yu, T.,
Rossi, R., and Bunescu, R. (2019). Figure captioning
with reasoning and sequence-level training.
Guan, Y., Guo, S., Li, R., Li, X., and Zhang, H. (2021).
Frame semantics guided network for abstractive sen-
tence summarization.
Gupta, S. and Gupta, S. K. (2018). Abstractive summariza-
tion: An overview of the state of the art.
Hsu, T.-Y., Giles, C. L., and Huang, T.-H. K. (2021). Sci-
cap: Generating captions for scientific figures.
Huang, C.-Y., Hsu, T.-Y., Rossi, R., Nenkova, A., Kim,
S., Chan, G. Y.-Y., Koh, E., Giles, C. L., and Ting-
Hao’Kenneth’Huang (2023). Summaries as captions:
Generating figure captions for scientific documents
with automated text summarization.
James G. Mork, Antonio J. Jimeno Yepes, A. R. A. (2013).
The nlm medical text indexer system for indexing
biomedical literature.
Jing, B., Xie, P., and Xing, E. (2018). On the automatic
generation of medical imaging reports.
Khan, B., Shah, Z. A., Usman, M., Khan, I., and Niazi,
B. (2023). Exploring the landscape of automatic text
summarization: A comprehensive survey.
Li, C., Wong, C., Zhang, S., Usuyama, N., Liu, H., Yang,
J., Naumann, T., Poon, H., and Gao, J. (2024). Llava-
med: Training a large language-and-vision assistant
for biomedicine in one day.
Li, Y., Liang, X., Hu, Z., and Xing, E. (2018). Hybrid
retrieval-generation reinforced agent for medical im-
age report generation.
Lin, H. and Ng, V. (2019). Abstractive summarization: A
survey of the state of the art.
Liu, Y. (2019). Fine-tune bert for extractive summarization.
Mike Lewis and, Y. L., Goyal, N., Ghazvininejad, M., Mo-
hamed, A., Levy, O., Stoyanov, V., and Zettlemoyer,
L. (2020). Bart: Denoising sequence-to-sequence pre-
training for natural language generation, translation,
and comprehension.
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S.,
Matena, M., Zhou, Y., Li, W., and Liu, P. J. (2019).
Exploring the limits of transfer learning with a unified
text-to-text transformer.
Shin, H., Roberts, K., Lu, L., Demner-Fushman, D., Yao,
J., and Summers, R. (2016). Learning to read chest
x-rays: recurrent neural cascade model for automated
image annotation.
Shukre, S., Salunkhe, S., Rathi, P., Shinde, V., and Mane,
P. M. V. (2023). Research paper summarization using
nlp.
Thompson, P. P.-S. J. M. C. S. H. (2017). Research: The
readability of scientific texts is decreasing over time.
Yuxuan Xiong, Bo Du, P. Y. (2019). Reinforced transformer
for medical image captioning.
Zachary, V. V., Trillo, J., Abalorio, C., Bustillo, J., Bojocan,
J., and Elape, M. (2022). Ocr-based hybrid image text
summarizer using luhn algorithm with finetunetrans-
former modelsfor long document.
Zhang, Z., Xie, Y., Xing, F., McGough, M., and Yang, L.
(2017). Mdnet: A semantically and visually inter-
pretable medical image diagnosis network.
Zhong, M., Liu, P., Chen, Y., Wang, D., Qiu, X., and Huang,
X. (2020). Extractive summarization as text matching.
A Multimodal Approach to Research Paper Summarization
957