Text Classification for Monolingual Political Manifestos with Words Out of Vocabulary
Arsenii Rasov, Ilya Obabkov, Eckehard Olbrich, Ivan Yamshchikov
2020
Abstract
In this position paper, we implement an automatic coding algorithm for electoral programs from the Manifesto Project Database. We propose a new approach that works with new words that are out of the training vocabulary, replacing them with the words from training vocabulary that are the closest neighbors in the space of word embeddings. A set of simulations demonstrates that the proposed algorithm shows classification accuracy comparable to the state-of-the-art benchmarks for monolingual multi-label classification. The agreement levels for the algorithm is comparable with manual labeling. The results for a broad set of model hyperparam-eters are compared to each other.
DownloadPaper Citation
in Harvard Style
Rasov A., Obabkov I., Olbrich E. and Yamshchikov I. (2020). Text Classification for Monolingual Political Manifestos with Words Out of Vocabulary.In Proceedings of the 5th International Conference on Complexity, Future Information Systems and Risk - Volume 1: COMPLEXIS, ISBN 978-989-758-427-5, pages 149-154. DOI: 10.5220/0009792101490154
in Bibtex Style
@conference{complexis20,
author={Arsenii Rasov and Ilya Obabkov and Eckehard Olbrich and Ivan Yamshchikov},
title={Text Classification for Monolingual Political Manifestos with Words Out of Vocabulary},
booktitle={Proceedings of the 5th International Conference on Complexity, Future Information Systems and Risk - Volume 1: COMPLEXIS,},
year={2020},
pages={149-154},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0009792101490154},
isbn={978-989-758-427-5},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 5th International Conference on Complexity, Future Information Systems and Risk - Volume 1: COMPLEXIS,
TI - Text Classification for Monolingual Political Manifestos with Words Out of Vocabulary
SN - 978-989-758-427-5
AU - Rasov A.
AU - Obabkov I.
AU - Olbrich E.
AU - Yamshchikov I.
PY - 2020
SP - 149
EP - 154
DO - 10.5220/0009792101490154