Polish Texts Topic Classification Evaluation
Tomasz Walkowiak, Piotr Malak
2018
Abstract
Abstract: The paper presents preparation, lead and results of evaluation of efficiency of text classification (TC) methods for Polish. The subject language is of complex morphology, it belongs to flexional languages. Thus there is a strong need of making proper text preprocessing in order to guarantee reliable TC. Basing on authors’ practical experience from former TC, IR and general NLP experiments set of preprocessing rules was applied. Also feature-documents matrix was designed with respect to the most promising feature selected. About 216 experiments on exemplar corpus in subject (topic) classification task, with different preprocessing, weighting, filtering (for dimensions reduction) schemes and classifiers was conducted. Results shows there is not substantial increase of accuracy when using most of classical pre-processing steps in case of corpus of large size (at least 1000 exemplars per class). The highest impact authors were able to obtain concerned the system costs of TC processes, not the TC accuracy.
DownloadPaper Citation
in Harvard Style
Walkowiak T. and Malak P. (2018). Polish Texts Topic Classification Evaluation.In Proceedings of the 10th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-275-2, pages 515-522. DOI: 10.5220/0006601605150522
in Bibtex Style
@conference{icaart18,
author={Tomasz Walkowiak and Piotr Malak},
title={Polish Texts Topic Classification Evaluation},
booktitle={Proceedings of the 10th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2018},
pages={515-522},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006601605150522},
isbn={978-989-758-275-2},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 10th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Polish Texts Topic Classification Evaluation
SN - 978-989-758-275-2
AU - Walkowiak T.
AU - Malak P.
PY - 2018
SP - 515
EP - 522
DO - 10.5220/0006601605150522