ase
6.5
Better in accuracy compared to the small
model, but lacking in completeness.
Flan-T5-
large
6.8
Good in accuracy but lacking in
completeness.
GPT-3.5
8.9 Excellent in accuracy, coherence, and
completeness.
Pegasus-
large
7.2 Good in accuracy and coherence, but
occasionally adds extra information.
4 CONCLUSION
This study compares the performance of different
large language models in text summarization tasks.
Extensive experiments are conducted to evaluate these
models. The results show that the size of the model has
a significant impact on its performance in text
summarization. Generally, larger models tend to
perform better. On the ROUGE and BERT Score
metrics, Flan-T5-large performed the best, indicating
that it can generate summaries most similar and
semantically relevant to the reference summaries.
GPT-3.5 performed second best, suggesting that it can
also generate fairly similar and semantically relevant
summaries. Pegasus-large performed the worst,
indicating a significant deviation from the reference
summaries. This may be related to the training
methods and datasets used by the models. According
to the evaluation of GPT-4, GPT-3.5 performs
excellently in terms of accuracy, completeness, and
readability, with only minor compromises in
conciseness. The text summarization performance of
Pegasus-large is average, and Flan-T5 models perform
relatively poorly, especially in terms of accuracy and
completeness. Therefore, the GPT-3.5 model can be
chosen when the most accurate and complete
summary is needed while Pegasus-large and Flan-T5-
base could be suitable options when a shorter
summary is needed but key information should be
retained. In addition, Flan-T5-large or Flan-T5-small
can be chosen when the shortest summary is desired at
the expense of some information completeness. In the
future, other mainstream large language models such
as GPT-4, Llama 2, etc., will be considered as research
objectives for the next stage. Additionally, further
linguistic analysis of the text features of different
models will be considered.
REFERENCES
Mridha, F. Muhammad, et al, “A survey of automatic text
summarization: Progress, process, and challenges,”
IEEE Access, vol. 9 2021, pp. 156043-156070
El-Kassas, S. Wafaa, et al, “Automatic text summarization:
A comprehensive survey,” Expert systems with
applications, vol. 165, 2021, pp. 113679.
Luhn, P. Hans, “The automatic creation of literature
abstracts,” IBM Journal of research and development,
vol. 2, 1958, pp. 159-165
Edmundson, P. Harold, “New methods in automatic
extracting,” Journal of the ACM (JACM), vol. 16,
1969, pp. 264-285
Kupiec, Julian, P. Jan, F. Chen, “A trainable document
summarizer,” Proceedings of the 18th annual
international ACM SIGIR conference on Research
and development in information retrieval, 1995
Conroy, M. John, P. Dianne, O'leary, “Text summarization
via hidden markov models,” Proceedings of the 24th
annual international ACM SIGIR conference on
Research and development in information retrieval,
2001
Rush, M. Alexander, C. Sumit, J. Weston, “A neural
attention model for abstractive sentence
summarization,” 2015, unpublished
Nallapati, Ramesh, et al, “Abstractive text summarization
using sequence-to-sequence rnns and beyond,” 2016,
unpublished
OpenAI, “GPT-4 technical report,” 2023, unpublished
J. Q. Zhang, et al, “Pegasus: Pre-training with extracted
gap-sentences for abstractive summarization,”
International Conference on Machine Learning,
PMLR, 2020
Hermann, M. Karl, et al, “Teaching machines to read and
comprehend,” Advances in neural information
processing systems, vol. 28, 2015