trained models since they are highly effective in the
field of NLP, and the tasks in the domain are recurrent
and repetitive and deal with a homogenous entity
which is language. Meaning that the hyperparameters
that are optimal in a big model will very likely yield
the same results in smaller tasks with the same nature.
6 CONCLUSIONS
This paper provides an insight into good practices in
hyperparameter optimization in natural language
processing related tasks. We found out that there are
common traits in the optimization process of
hyperparameters and that some particular HPO
techniques work well with certain tasks. Also, the
values reported in this paper from certain studies can
be reproduced in similar tasks. The recent
developments in transformer architectures, have
paved the way for optimal models down the line by
means of transfer learning, which benefits ultimately
the hyperparameter optimization in NLP.
REFERENCES
Moore, R., Lopes, J., 1999. Paper templates. In
TEMPLATE’06, 1st International Conference on
Template Production. SCITEPRESS.
Smith, J., 1998. The book, The publishing company.
London, 2
nd
edition.
Aghaebrahimian, Ahmad, and Mark Cieliebak. 2019.
“Hyperparameter Tuning for Deep Learning in Natural
Language Processing,” 7.
Bergstra, James, and Yoshua Bengio. 2012. “Random
Search for Hyper-Parameter Optimization,” 25.
Bergstra, James, Brent Komer, Chris Eliasmith, Dan
Yamins, and David D Cox. 2015. “Hyperopt: A Python
Library for Model Selection and Hyperparameter
Optimization.” Computational Science & Discovery 8
(1): 014008. https://doi.org/10.1088/1749-
4699/8/1/014008.
Caselles-Dupré, Hugo, Florian Lesaint, and Jimena Royo-
Letelier. 2018. “Word2Vec Applied to
Recommendation: Hyperparameters Matter.”
ArXiv:1804.04212 [Cs, Stat], August.
http://arxiv.org/abs/1804.04212.
Claesen, Marc, and Bart De Moor. 2015. “Hyperparameter
Search in Machine Learning.” ArXiv:1502.02127 [Cs,
Stat], April. http://arxiv.org/abs/1502.02127.
Costa, Victor O., and Cesar R. Rodrigues. 2018.
“Hierarchical Ant Colony for Simultaneous Classifier
Selection and Hyperparameter Optimization.” In 2018
IEEE Congress on Evolutionary Computation (CEC),
1–8. Rio de Janeiro: IEEE.
https://doi.org/10.1109/CEC.2018.8477834.
Dernoncourt, Franck, and Ji Young Lee. 2016. “Optimizing
Neural Network Hyperparameters with Gaussian
Processes for Dialog Act Classification.” In 2016 IEEE
Spoken Language Technology Workshop (SLT), 406–
13. San Diego, CA: IEEE.
https://doi.org/10.1109/SLT.2016.7846296.
Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina
Toutanova. 2019. “BERT: Pre-Training of Deep
Bidirectional Transformers for Language
Understanding.” ArXiv:1810.04805 [Cs], May.
http://arxiv.org/abs/1810.04805.
Feurer, Matthias, Aaron Klein, Katharina Eggensperger,
Jost Tobias Springenberg, Manuel Blum, and Frank
Hutter. 2015. “Efficient and Robust Automated
Machine Learning,” 9.
Golovin, Daniel, Benjamin Solnik, Subhodeep Moitra,
Greg Kochanski, John Karro, and D. Sculley. 2017.
“Google Vizier: A Service for Black-Box
Optimization.” In Proceedings of the 23rd ACM
SIGKDD International Conference on Knowledge
Discovery and Data Mining - KDD ’17, 1487–95.
Halifax, NS, Canada: ACM Press.
https://doi.org/10.1145/3097983.3098043.
Hinton, Geoffrey E. 2012. “A Practical Guide to Training
Restricted Boltzmann Machines.” In Neural Networks:
Tricks of the Trade, edited by Grégoire Montavon,
Geneviève B. Orr, and Klaus-Robert Müller, 7700:599–
619. Lecture Notes in Computer Science. Berlin,
Heidelberg: Springer Berlin Heidelberg.
https://doi.org/10.1007/978-3-642-35289-8_32.
Klein, Aaron, Stefan Falkner, Simon Bartels, Philipp
Hennig, and Frank Hutter. 2017. “Fast Bayesian
Hyperparameter Optimization on Large Datasets.”
Electronic Journal of Statistics 11 (2): 4945–68.
https://doi.org/10.1214/17-EJS1335SI.
Komninos, Alexandros, and Suresh Manandhar. 2016.
“Dependency Based Embeddings for Sentence
Classification Tasks.” In Proceedings of the 2016
Conference of the North American Chapter of the
Association for Computational Linguistics: Human
Language Technologies, 1490–1500. San Diego,
California: Association for Computational Linguistics.
https://doi.org/10.18653/v1/N16-1175.
Melis, Gábor, Chris Dyer, and Phil Blunsom. 2017. “On the
State of the Art of Evaluation in Neural Language
Models.” ArXiv:1707.05589 [Cs], November.
http://arxiv.org/abs/1707.05589.
Pedregosa, Fabian, Gael Varoquaux, Alexandre Gramfort,
Vincent Michel, Bertrand Thirion, Olivier Grisel,
Mathieu Blondel, et al. 2011. “Scikit-Learn: Machine
Learning in Python.” MACHINE LEARNING IN
PYTHON, 6.
Peters, Matthew E., Mark Neumann, Mohit Iyyer, Matt
Gardner, Christopher Clark, Kenton Lee, and Luke
Zettlemoyer. 2018. “Deep Contextualized Word
Representations.” ArXiv:1802.05365 [Cs], March.
http://arxiv.org/abs/1802.05365.
Reimers, Nils, and Iryna Gurevych. 2017. “Optimal
Hyperparameters for Deep LSTM-Networks for