of active learning algorithms. Journal of Machine
Learning Research, 5(Mar):255–291.
Bishop, C. M. (2006). Pattern recognition and machine
learning. springer.
Blundell, C., Cornebise, J., Kavukcuoglu, K., and Wier-
stra, D. (2015). Weight uncertainty in neural net-
work. In International Conference on Machine Learn-
ing, pages 1613–1622. PMLR.
Brier, G. W. et al. (1950). Verification of forecasts ex-
pressed in terms of probability. Monthly weather re-
view, 78(1):1–3.
Cer, D., Yang, Y., Kong, S.-y., Hua, N., Limtiaco, N., John,
R. S., Constant, N., Guajardo-Cespedes, M., Yuan, S.,
Tar, C., et al. (2018). Universal sentence encoder for
english. In Proceedings of the 2018 Conference on
Empirical Methods in Natural Language Processing:
System Demonstrations, pages 169–174.
Chauhan, N. K. and Singh, K. (2018). A review on con-
ventional machine learning vs deep learning. In Inter-
national Conference on Computing, Power and Com-
munication Technologies (GUCON), pages 347–352.
IEEE.
Corazza, M., Menini, S., Cabrio, E., Tonelli, S., and Vil-
lata, S. (2020). A multilingual evaluation for online
hate speech detection. ACM Transactions on Internet
Technology (TOIT), 20(2):1–22.
Cortes, C., DeSalvo, G., and Mohri, M. (2016). Learning
with rejection. In International Conference on Algo-
rithmic Learning Theory, pages 67–82. Springer.
Davidson, T., Warmsley, D., Macy, M., and Weber, I.
(2017). Automated hate speech detection and the
problem of offensive language. In Proceedings of
the International AAAI Conference on Web and Social
Media, volume 11.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.
(2018). Bert: Pre-training of deep bidirectional trans-
formers for language understanding. arXiv preprint
arXiv:1810.04805.
Dudley, J. J. and Kristensson, P. O. (2018). A review of
user interface design for interactive machine learning.
ACM Transactions on Interactive Intelligent Systems
(TiiS), 8(2):1–37.
Gal, Y. and Ghahramani, Z. (2016). Dropout as a bayesian
approximation: Representing model uncertainty in
deep learning. In international conference on machine
learning, pages 1050–1059. PMLR.
Geifman, Y. and El-Yaniv, R. (2017). Selective classifica-
tion for deep neural networks. In Proceedings of the
31st International Conference on Neural Information
Processing Systems, pages 4885–4894.
Green, B. and Chen, Y. (2019). Disparate interactions: An
algorithm-in-the-loop analysis of fairness in risk as-
sessments. In Proceedings of the Conference on Fair-
ness, Accountability, and Transparency, pages 90–99.
Haering, M., Andersen, J. S., Biemann, C., Loosen, W.,
Milde, B., Pietz, T., Stoecker, C., Wiedemann, G.,
Zukunft, O., and Maalej, W. (2021). Forum 4.0: An
open-source user comment analysis framework. In
Proceedings of the 16th Conference of the European
Chapter of the Association for Computational Lin-
guistics: System Demonstrations, pages 63–70.
Hastie, T., Tibshirani, R., and Friedman, J. (2009). The el-
ements of statistical learning: data mining, inference,
and prediction. Springer Science & Business Media.
He, J., Zhang, X., Lei, S., Chen, Z., Chen, F., Alhamadani,
A., Xiao, B., and Lu, C. (2020). Towards more
accurate uncertainty estimation in text classification.
In Proceedings of the 2020 Conference on Empirical
Methods in Natural Language Processing (EMNLP),
pages 8362–8372.
Hendrycks, D. and Gimpel, K. (2016). A baseline for de-
tecting misclassified and out-of-distribution examples
in neural networks. arXiv preprint arXiv:1610.02136.
Hern
´
andez-Lobato, J. M. and Adams, R. (2015). Probabilis-
tic backpropagation for scalable learning of bayesian
neural networks. In International Conference on Ma-
chine Learning, pages 1861–1869. PMLR.
Holzinger, A. (2016). Interactive machine learning for
health informatics: when do we need the human-in-
the-loop? Brain Informatics, 3(2):119–131.
Karmakharm, T., Aletras, N., and Bontcheva, K. (2019).
Journalist-in-the-loop: Continuous learning as a ser-
vice for rumour analysis. In Proceedings of the 2019
Conference on Empirical Methods in Natural Lan-
guage Processing and the 9th International Joint Con-
ference on Natural Language Processing (EMNLP-
IJCNLP): System Demonstrations, pages 115–120.
Kendall, A. and Gal, Y. (2017). What uncertainties do
we need in bayesian deep learning for computer vi-
sion? In Proceedings of the 31st International Con-
ference on Neural Information Processing Systems,
pages 5580–5590.
Lai, C.-C. and Tsai, M.-C. (2004). An empirical perfor-
mance comparison of machine learning methods for
spam e-mail categorization. In Fourth International
Conference on Hybrid Intelligent Systems (HIS’04),
pages 44–48. IEEE.
LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learn-
ing. nature, 521(7553):436–444.
Lewis, D. D. and Gale, W. A. (1994). A sequential algo-
rithm for training text classifiers. In SIGIR’94, pages
3–12. Springer.
Lewis, D. D., Yang, Y., Rose, T. G., and Li, F. (2004).
Rcv1: A new benchmark collection for text catego-
rization research. Journal of machine learning re-
search, 5(Apr):361–397.
Liu, Z. and Chen, H. (2017). A predictive performance
comparison of machine learning models for judicial
cases. In IEEE Symposium Series on Computational
Intelligence (SSCI), pages 1–6. IEEE.
Luu, S. T., Nguyen, H. P., Van Nguyen, K., and Nguyen,
N. L.-T. (2020). Comparison between traditional ma-
chine learning models and neural network models for
vietnamese hate speech detection. In International
Conference on Computing and Communication Tech-
nologies (RIVF), pages 1–6. IEEE.
Maalej, W., Kurtanovi
´
c, Z., Nabil, H., and Stanik, C.
(2016). On the automatic classification of app reviews.
Requirements Engineering, 21(3):311–331.
Towards More Reliable Text Classification on Edge Devices via a Human-in-the-Loop
645