AutoML cannot beat humans in situation in which
extraordinary results are required. This is again
shown by our discovery, that concerning the cases in
which AutoML outperforms humans, this
outperformance is rather little and in most of the cases
only one or two AutoML tools manage to do so
(although the average is 2.5, which is attributable to
the first task where all four AutoML tools beat human
performance).
Finally, we want to address usage of the considered
AutoML tools and AutoML benchmark. Occasionally,
bold human intervention is required in order to make
them work properly. Clearly, this is understandable as
AutoML in general is a relatively new field and the
tools are partly in early stage of development.
However, it contradicts the idea of automated machine
learning and we see great potential regarding stability,
reliability and function range.
5 CONCLUSION AND OUTLOOK
The present works contributes to the standard of
knowledge concerning AutoML performance in text
classification. Our research interests were two-fold;
comparison of performance between AutoML tools
and confrontation with human performance. The
results show that, in most cases, AutoML is not able
to outperform humans in text classification. However,
there are text classification tasks that can be solved
better or equally by AutoML tools. With automated
approaches becoming increasingly sophisticated, we
see this disparity shrink in the future.
We see great potential in future development of
specific text classification modules within AutoML
tools. Such modules would further facilitate usage of
machine learning by beginners and establishing a
baseline for advanced users.
In the future, we will focus on investigating impact
of different pre-processing techniques for texts
(including more embedding types) for conclusive
usage in AutoML tools. Evidently, there are more
AutoML tools which should be evaluated, too.
Furthermore, testing AutoML for other NLP tasks
like named entity recognition is an interesting topic
for further research. Additionally, we will analyse
performance of commercial cloud services that come
with ready-to-use text classification functionality.
REFERENCES
Almeida, T. A., Hidalgo, J. M. G., & Yamakami, A. (2011).
Contributions to the Study of SMS Spam Filtering:
New Collection and Results. In DocEng ’11,
Proceedings of the 11th ACM Symposium on
Document Engineering (pp. 259–262). Association for
Computing Machinery.
https://doi.org/10.1145/2034691.2034742
Drori, I., Liu, L., Nian, Y., Koorathota, S., Li, J., Moretti,
A., Freire, J., & Udell, M. (2019). AutoML using
Metadata Language Embeddings.
Erickson, N., Mueller, J., Shirkov, A., Zhang, H., Larroy,
P., Li, M., & Smola, A. (2020). AutoGluon-Tabular:
Robust and Accurate AutoML for Structured Data.
ArXiv Preprint ArXiv:2003.06505.
Estevez-Velarde, S., Gutiérrez, Y., Montoyo, A., &
Almeida-Cruz, Y. (2019). AutoML Strategy Based on
Grammatical Evolution: A Case Study about
Knowledge Discovery from Text. In Proceedings of the
57th Annual Meeting of the Association for
Computational Linguistics (pp. 4356–4365).
Association for Computational Linguistics.
https://doi.org/10.18653/v1/P19-1428
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J.,
Blum, M., & Hutter, F. (2015). Efficient and Robust
Automated Machine Learning. In C. Cortes, N. D.
Lawrence, D. D. Lee, M. Sugiyama, & R. Garnett
(Eds.), Advances in Neural Information Processing
Systems 28 (pp. 2962–2970). Curran Associates, Inc.
http://papers.nips.cc/paper/5872-efficient-and-robust-
automated-machine-learning.pdf
Gijsbers, P., LeDell, E., Poirier, S., Thomas, J., Bischl, B.,
& Vanschoren, J. (2019). An Open Source AutoML
Benchmark. ArXiv Preprint ArXiv:1907.00909
[Cs.LG].
Go, A., Bhayani, R., & Huang, L. (2009). Twitter Sentiment
Classification using Distant Supervision. Processing,
1–6. http://www.stanford.edu/alecmgo/papers/
TwitterDistantSupervision09.pdf
H2O.ai. (2017). H2O AutoML.
http://docs.h2o.ai/h2o/latest-stable/h2o-docs/
automl.html
Hanussek, M., Blohm, M., & Kintz, M. (2020). Can
AutoML outperform humans? An evaluation on
popular OpenML datasets using AutoML Benchmark.
He, X., Zhao, K., & Chu, X. (2019, August 2). AutoML: A
Survey of the State-of-the-Art.
http://arxiv.org/pdf/1908.00709v4
Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y.,
& Potts, C. (2011). Learning Word Vectors for
Sentiment Analysis. In Proceedings of the 49th Annual
Meeting of the Association for Computational
Linguistics: Human Language Technologies (pp. 142–
150). Association for Computational Linguistics.
http://www.aclweb.org/anthology/P11-1015
Madrid, J., Escalante, H. J., & Morales, E. (2019). Meta-
learning of textual representations. CoRR,
abs/1906.08934.
Mehmood, K., Essam, D., Shafi, K., & Malik, M. K. (2019).
Sentiment Analysis for a Resource Poor Language—
Roman Urdu. ACM Trans. Asian Low-Resour. Lang.
Inf. Process., 19(1). https://doi.org/10.1145/3329709
Mishra, A. Amazon Comprehend. In Machine Learning in