As shown in the Table 2, the Multi-Layer
attention achieves better ROUGE-1 scores. When it
comes to ROUGE-2 and ROUGE-L, the
Convolutional-Layer mechanism in both the encoder
and the decoder attention achieves better scores
compared to the linear layer.
The Multi-Layer attention performed well on the
ROUGE-1 both on the validation dataset and test
dataset. And the improvement is relatively visible.
The training time of the Multi-Layer attention model
has hardly increased, which means this method has
practical application significance.
4 CONCLUSION
This paper proposes a new Multi-Layer attention
model to solve Chinese text summarization. There is
a small uptick in the parameter count and the training
time remains almost unchanged. But there has been a
visible improvement in the performance of the model.
This model obtains a 39.51 ROUGE-1 score and
37.25 ROUGE-L on LCSTS validation dataset.
Attentions in different layers can have different
learning task in the input sequence, while sharing the
same weight in the decoder’s hidden state. The author
also proposes another optional solution, in which the
author uses a 1x1 convolutional kernel to replace the
linear layers in the encoder and decoder. The use of
convolutional kernel can make the model smaller,
according to the experiment, the performance has
hardly decreased.
REFERENCES
B. Dorr, D. Zajic and R. Schwartz, “Hedge trimmer: A
parse-and-trim approach to headline generation,” In
Proceedings of the HLT-NAACL 03 on Text
summarization workshop (2003), pp. 1–8.
S. Chopra, M. Auli, A. M Rush and SEAS Harvard,
“Abstractive sentence summarization with attentive
recurrent neural networks,” in Proceedings of NAACL-
HLT16 (2016). pp. 93–98.
Z. Tao and C. Chen. "Research on automatic text
summarization method based on tf-idf," Advances in
Intelligent Systems and Interactive Applications:
Proceedings of the 4th International Conference on
Intelligent, Interactive Systems and Applications
(IISA2019) 4. Springer International Publishing, 206-
212 (2020).
X. Chengzhang, and L. Dan, Journal of Physics:
Conference Series. 976(1), 12006 (2018).
Li, Zhixin, "Text summarization method based on double
attention pointer network," in IEEE Access. 11279-
11288 (2020).
Z. Xianwei, et al, Journal of Physics: Conference Series.
1302(2), 22010 (2019).
Sutskever, Ilya, O. Vinyals, and Q. V. Le, "Sequence to
sequence learning with neural networks," in Advances
in neural information processing systems 27 (2014).
Neubig, Graham, "Neural machine translation and
sequence-to-sequence models: A tutorial," arXiv
preprint arXiv:1703.01619 (2017).
Staudemeyer, C. Ralf, and E. R. Morris, "Understanding
LSTM--a tutorial into long short-term memory
recurrent neural networks," arXiv preprint
arXiv:1909.09586 (2019).
Z Niu, G. Zhong, and H. Yu, "A review on the attention
mechanism of deep learning," in Neurocomputing 452
(2021). pp. 48-62.
Paulus, Romain, C. Xiong, and R. Socher, "A deep
reinforced model for abstractive summarization," arXiv
preprint arXiv:1705.04304 (2017).
Badrinarayanan, Vijay, A. Kendall, and R. Cipolla,
"Segnet: A deep convolutional encoder-decoder
architecture for image segmentation," in IEEE
transactions on pattern analysis and machine
intelligence (2017). pp. 2481-2495.