This work particularly focus on one of the core
components of the architecture: the model training
module. To instantiate technologically such model,
we first analyzed the characteristics of five open-
source AutoML tools (Auto-Keras, Auto-Sklearn,
Auto-Weka, H2O AutoML and TransmogrifAI).
Then, we performed a benchmark experimental study
with the two tools that presented a distributed ML
capability: H2O AutoML and TransmogrifAI. The
experiments were conducted using three real-world
datasets provided by the software company (churn,
event forecasting and fraud detection). The obtained
results allowed us to evaluate the potential of both Au-
toML technologies for the model training module of
the proposed architecture.
Overall, the proposed framework received a pos-
itive feedback from the software company, which
opted to select the H2O AutoML tool for its model
training module. In future work, additional telecom-
munications datasets will be addressed, in order to
further benchmark the AutoML tools. In particular,
we wish to extend the framework ML capabilities to
handle more ML tasks (e.g., ordinal classification,
multi-target regression). Moreover, we intend to fo-
cus the development on the remaining components of
the architecture, in order to select the best technolo-
gies to be used (e.g., for handling missing data).
ACKNOWLEDGEMENTS
This work was executed under the project IRMDA
- Intelligent Risk Management for the Digital Age,
Individual Project, NUP: POCI-01-0247-FEDER-
038526, co-funded by the Incentive System for Re-
search and Technological Development, from the
Thematic Operational Program Competitiveness of
the national framework program - Portugal2020.
REFERENCES
Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer,
W. P. (2002). Smote: synthetic minority over-
sampling technique. Journal of artificial intelligence
research, 16:321–357.
Cook, D. (2016). Practical machine learning with H2O:
powerful, scalable techniques for deep learning and
AI. ” O’Reilly Media, Inc.”.
Cortez, P. (2010). Data mining with neural networks and
support vector machines using the r/rminer tool. In In-
dustrial Conference on Data Mining, pages 572–583.
Springer.
Darwiche, A. (2018). Human-level intelligence or animal-
like abilities? Commun. ACM, 61(10):56–67.
Feurer, M., Springenberg, J. T., and Hutter, F. (2015).
Initializing bayesian hyperparameter optimization via
meta-learning. In Twenty-Ninth AAAI Conference on
Artificial Intelligence.
Gibert, K., Izquierdo, J., S
`
anchez-Marr
`
e, M., Hamilton,
S. H., Rodr
´
ıguez-Roda, I., and Holmes, G. (2018).
Which method to use? an assessment of data mining
methods in environmental data science. Environmen-
tal modelling & software, 110:3–27.
Gijsbers, P., LeDell, E., Thomas, J., Poirier, S., Bischl, B.,
and Vanschoren, J. (2019). An open source automl
benchmark. arXiv preprint arXiv:1907.00909.
Guyon, I., Sun-Hosoya, L., Boull
´
e, M., Escalante, H. J.,
Escalera, S., Liu, Z., Jajetic, D., Ray, B., Saeed, M.,
Sebag, M., et al. (2019). Analysis of the automl
challenge series 2015–2018. In Automated Machine
Learning, pages 177–219. Springer.
H2O (2019). Sparkling water. http://docs.h2o.ai/
sparkling-water/2.4/latest-stable/doc/index.html.
H2O.ai (2019). Automl: Automatic machine learning. http:
//docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html.
He, X., Zhao, K., and Chu, X. (2019). Automl: A survey of
the state-of-the-art. arXiv preprint arXiv:1908.00709.
Jin, H., Song, Q., and Hu, X. (2018). Auto-keras: Ef-
ficient neural architecture search with network mor-
phism. arXiv preprint arXiv:1806.10282.
Kotthoff, L., Thornton, C., Hoos, H. H., Hutter, F., and
Leyton-Brown, K. (2017). Auto-weka 2.0: Automatic
model selection and hyperparameter optimization in
weka. The Journal of Machine Learning Research,
18(1):826–830.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,
Thirion, B., Grisel, O., Blondel, M., Prettenhofer,
P., Weiss, R., Dubourg, V., et al. (2011). Scikit-
learn: Machine learning in python. Journal of ma-
chine learning research, 12(Oct):2825–2830.
Peteiro-Barral, D. and Guijarro-Berdi
˜
nas, B. (2013). A
survey of methods for distributed machine learning.
Progress in Artificial Intelligence, 2(1):1–11.
Salesforce (2019). Transmogrifai. https://docs.transmogrif.
ai/en/stable/.
Thornton, C., Hutter, F., Hoos, H. H., and Leyton-Brown,
K. (2013). Auto-weka: Combined selection and
hyperparameter optimization of classification algo-
rithms. In Proceedings of the 19th ACM SIGKDD in-
ternational conference on Knowledge discovery and
data mining, pages 847–855. ACM.
Truong, A., Walters, A., Goodsitt, J., Hines, K., Bruss, B.,
and Farivar, R. (2019). Towards automated machine
learning: Evaluation and comparison of automl ap-
proaches and tools. arXiv preprint arXiv:1908.05557.
Witten, I. H., Frank, E., Hall, M. A., and Pal, C. J. (2016).
Data Mining: Practical machine learning tools and
techniques. Morgan Kaufmann.
An Automated and Distributed Machine Learning Framework for Telecommunications Risk Management
107