(with more than 90% accuracy at the top 3 docu-
ments) and 4% in the answer selection module of the
AQA question answering system for the Czech lan-
guage with the final Mean Average Precision of 72%.
We have also introduced the latest version of
the SQAD question answering benchmark dataset,
which now offers more than 13,000 richly annotated
question-answer pairs. The evaluation of the system
with this enlarged dataset indicates that the size of the
training set allows the approach to be more specific in
identifying the correct answer when the current best
accuracy reaches almost 79% with SQAD 3.0.
In the future work, the development will focus on
analysis of the broader context of the answers, with
evaluation based both on the preprocessing steps as
well as employing the new transformer-based net-
works. The results of the detailed error analysis
also direct the future improvements to processing par-
ticular question and answer types with specifically
adapted parameter values.
ACKNOWLEDGEMENTS
This work has been partly supported by the Czech
Science Foundation under the project GA18-23891S.
Access to computing and storage facilities owned
by parties and projects contributing to the National
Grid Infrastructure MetaCentrum provided under the
programme "Projects of Large Research, Develop-
ment, and Innovations Infrastructures" (CESNET
LM2015042), is greatly appreciated.
REFERENCES
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.
(2019). BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding. In Pro-
ceedings of the NAACL 2019, Volume 1 (Long and
Short Papers), pages 4171–4186.
Horák, A. and Medved’, M. (2014). SQAD: Simple Ques-
tion Answering Database. In Eighth Workshop on Re-
cent Advances in Slavonic Natural Language Process-
ing, RASLAN 2014, pages 121–128, Brno. Tribun EU.
Jakubí
ˇ
cek, M., Kilgarriff, A., Ková
ˇ
r, V., Rychlý, P., and Su-
chomel, V. (2013). The tenten corpus family. 7th In-
ternational Corpus Linguistics Conference CL 2013.
Lai, G., Xie, Q., Liu, H., Yang, Y., and Hovy, E. (2017).
RACE: Large-scale ReAding Comprehension Dataset
From Examinations. In Proceedings of EMNLP 2017,
pages 785–794.
Medved’, M. and Horák, A. (2016). AQA: Automatic Ques-
tion Answering System for Czech. In Sojka, P. et al.,
editors, Text, Speech, and Dialogue, TSD 2016, pages
270–278, Switzerland. Springer.
Medved’, M., Horák, A., and Kušniráková, D. (2019).
Question and answer classification in czech question
answering benchmark dataset. In Proceedings of the
11th International Conference on Agents and Artifi-
cial Intelligence, Volume 2, pages 701–706, Prague,
Czech Republic. SCITEPRESS.
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E.,
DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., and
Lerer, A. (2017). Automatic differentiation in Py-
Torch. In NIPS Autodiff Workshop.
Rajpurkar, P., Jia, R., and Liang, P. (2018). Know What You
Don’t Know: Unanswerable Questions for SQuAD.
In Proceedings of the ACL 2018 (Volume 2: Short Pa-
pers), pages 784–789, Melbourne, Australia. Associ-
ationfor Computational Linguistics.
Ran, Q., Li, P., Hu, W., and Zhou, J. (2019). Option com-
parison network for multiple-choice reading compre-
hension. arXiv preprint arXiv:1903.03033.
ˇ
Reh˚u
ˇ
rek, R. and Sojka, P. (2010). Software Framework
for Topic Modelling with Large Corpora. In Proceed-
ings of the LREC 2010 Workshop on New Challenges
for NLP Frameworks, pages 45–50, Valletta, Malta.
ELRA.
Sabol, R., Medved’, M., and Horák, A. (2018). Recur-
rent networks in aqa answer selection. In Aleš Horák,
P. R. and Rambousek, A., editors, Proceedings of the
Twelfth Workshop on Recent Advances in Slavonic
Natural Languages Processing, RASLAN 2018, pages
53–62, Brno. Tribun EU.
Santos, C. d., Tan, M., Xiang, B., and Zhou, B.
(2016). Attentive pooling networks. arXiv preprint
arXiv:1602.03609.
Šmerk, P. (2009). Fast Morphological Analysis of Czech. In
Proceedings of Recent Advances in Slavonic Natural
Language Processing, RASLAN 2009, pages 13–16.
Šulganová, T., Medved’, M., and Horák, A. (2017). En-
largement of the Czech Question-Answering Dataset
to SQAD v2.0. In Proceedings of Recent Advances
in Slavonic Natural Language Processing, RASLAN
2017, pages 79–84.
Šmerk, P. (2010). K poˇcítaˇcové morfologické analýze
ˇceštiny (in Czech, Towards Computational Morpho-
logical Analysis of Czech). PhD thesis, Faculty of In-
formatics, Masaryk University.
Improving RNN-based Answer Selection for Morphologically Rich Languages
651