Automatic Document Summarization based on Statistical Information
Aigerim Mussina, Sanzhar Aubakirov, Paulo Trigo
2018
Abstract
This paper presents a comparative perspective in the field of automatic text summarization algorithms. The main contribution is the implementation of well-known algorithms and the comparison of different summarization techniques on corpora of news articles parsed from the web. The work compares three summarization techniques based on TextRank algorithm, namely: General TextRank, BM25, LongestCommonSubstring. For experiments, we used corpora based on news articles written in Russian and Kazakh. We implemented and experimented well-known algorithms, but we evaluated them differently from previous work in summary evaluation. In this research, we propose a summary evaluation method based on keywords extracted from the corpora. We describe the application of statistical information, show results of summarization processes and provide their comparison.
DownloadPaper Citation
in Harvard Style
Mussina A., Aubakirov S. and Trigo P. (2018). Automatic Document Summarization based on Statistical Information.In Proceedings of the 7th International Conference on Data Science, Technology and Applications - Volume 1: DATA, ISBN 978-989-758-318-6, pages 71-76. DOI: 10.5220/0006888400710076
in Bibtex Style
@conference{data18,
author={Aigerim Mussina and Sanzhar Aubakirov and Paulo Trigo},
title={Automatic Document Summarization based on Statistical Information},
booktitle={Proceedings of the 7th International Conference on Data Science, Technology and Applications - Volume 1: DATA,},
year={2018},
pages={71-76},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006888400710076},
isbn={978-989-758-318-6},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 7th International Conference on Data Science, Technology and Applications - Volume 1: DATA,
TI - Automatic Document Summarization based on Statistical Information
SN - 978-989-758-318-6
AU - Mussina A.
AU - Aubakirov S.
AU - Trigo P.
PY - 2018
SP - 71
EP - 76
DO - 10.5220/0006888400710076