Automatic Document Summarization based on Statistical Information

Aigerim Mussina, Sanzhar Aubakirov, Paulo Trigo



This paper presents a comparative perspective in the field of automatic text summarization algorithms. The main contribution is the implementation of well-known algorithms and the comparison of different summarization techniques on corpora of news articles parsed from the web. The work compares three summarization techniques based on TextRank algorithm, namely: General TextRank, BM25, LongestCommonSubstring. For experiments, we used corpora based on news articles written in Russian and Kazakh. We implemented and experimented well-known algorithms, but we evaluated them differently from previous work in summary evaluation. In this research, we propose a summary evaluation method based on keywords extracted from the corpora. We describe the application of statistical information, show results of summarization processes and provide their comparison.


Paper Citation

in Harvard Style

Mussina A., Aubakirov S. and Trigo P. (2018). Automatic Document Summarization based on Statistical Information.In Proceedings of the 7th International Conference on Data Science, Technology and Applications - Volume 1: DATA, ISBN 978-989-758-318-6, pages 71-76. DOI: 10.5220/0006888400710076

in Bibtex Style

author={Aigerim Mussina and Sanzhar Aubakirov and Paulo Trigo},
title={Automatic Document Summarization based on Statistical Information},
booktitle={Proceedings of the 7th International Conference on Data Science, Technology and Applications - Volume 1: DATA,},

in EndNote Style


JO - Proceedings of the 7th International Conference on Data Science, Technology and Applications - Volume 1: DATA,
TI - Automatic Document Summarization based on Statistical Information
SN - 978-989-758-318-6
AU - Mussina A.
AU - Aubakirov S.
AU - Trigo P.
PY - 2018
SP - 71
EP - 76
DO - 10.5220/0006888400710076