tion environments, thus justifying the necessity of in-
troducing new techniques to be effectively exploited.
From the results previously presented, it can be
noticed that there is not a unique ML technique which
outperforms all the others in every situation. More-
over, even by limiting to a single type of scenar-
ios there is not a single winner. On the contrary, a
slight change in the composition of the training set
and of the test set (i.e., considering different cases)
may change which is the technique which performs
best. For example, in the data extrapolation scenario
for K-means the best technique is SVR for C1, C3,
C4, and C7, LR for C2, NN for C5 and DT for C6.
The comparison between the best gray box models
and the reference Ernest model leads to two different
situations. In scenarios where applications are char-
acterized by regularity (i.e., Query 26 and SparkDL
with fixed data size), Ernest yields very good results
with MAPE values smaller than 11%, whereas the
best gray box model generally achieves worse perfor-
mance (MAPE of best models is in the range 3.84-
42.29%). Yet, in the remaining scenarios, which are
characterized by a larger variability in the application
execution times, the best gray box model outperforms
the Ernest model by a large margin. The MAPE range
of the latter is 36.81-187.0% while the largest error of
the best gray box model is only 31.59% (C of core in-
terpolation with K-means). However, recall that gray
box models use DAG-related features which are not
available for the test instance at prediction time (a pri-
ori), but they can be used as they are only to assess the
performance of the analyzed applications.
6 CONCLUSIONS AND FUTURE
WORK
In this paper, the accuracy of alternative supervised
machine learning techniques to assess the perfor-
mance of Spark applications has been analysed. Our
aim is to train models able at identifying perturba-
tions which affect the execution time of production
applications. Experimental results on a rich set of dif-
ferent scenarios demonstrated that the proposed gray
black box models are able to achieve relevant accu-
racy in different scenarios with different workloads.
Moreover, in complex scenarios (i.e., data extrapola-
tion in complex applications) where the Ernest refer-
ence model fails (error up to 187%), the largest er-
ror of the best gray-box model is 31.59%. However,
results show how there is no ML technique which
always outperforms the others, hence different tech-
niques have to be evaluated in each scenario to choose
the best model. As future work, the study of the
performance of Spark applications running on GPU-
based clusters is foreseen. Moreover, the use of the
models to identify resource contention on production
systems will be also considered.
ACKNOWLEDGEMENTS
This work has been partially supported by the project
ATMOSPHERE (https://atmosphere-eubrazil.eu),
funded by the Brazilian Ministry of Science, Technol-
ogy and Innovation (Project 51119 - MCTI/RNP 4th
Coordinated Call) and by the European Commission
under the Cooperation Programme, Horizon 2020
(grant agreement no 777154).
REFERENCES
Alipourfard, O., Liu, H. H., Chen, J., Venkataraman, S., Yu,
M., and Zhang, M. (2017). Cherrypick: Adaptively
unearthing the best cloud configurations for big data
analytics. In NSDI 2017 Proc., pages 469–482.
Ardagna, D., Barbierato, E., Evangelinou, A., Gianniti,
E., Gribaudo, M., Pinto, T. B. M., Guimar
˜
aes, A.,
Couto da Silva, A. P., and Almeida, J. M. (2018). Per-
formance prediction of cloud-based big data applica-
tions. In ICPE ’18, pages 192–199.
Ataie, E., Gianniti, E., Ardagna, D., and Movaghar,
A. (2016). A combined analytical modeling ma-
chine learning approach for performance prediction of
mapreduce jobs in cloud environment. In SYNASC
2016, Timisoara, Romania, September 24-27, 2016,
pages 431–439.
Bertoli, M., Casale, G., and Serazzi, G. (2009). JMT: per-
formance engineering tools for system modeling. SIG-
METRICS Performance Evaluation Review, 36(4):10–
15.
Csurka, G. (2017). Domain adaptation for visual applica-
tions: A comprehensive survey.
Didona, D. and Romano, P. (2015). Using analytical models
to bootstrap machine learning performance predictors.
In 21st IEEE International Conference on Parallel
and Distributed Systems, ICPADS 2015, Melbourne,
Australia, December 14-17, 2015, pages 405–413.
Golub, G. H. and Van Loan, C. F. (1996). Matrix Com-
putations. The Johns Hopkins University Press, third
edition.
Lazowska, E. D., Zahorjan, J., Graham, G. S., and Sev-
cik, K. C. (1984). Quantitative System Performance.
Prentice-Hall.
Mak, V. and Lundstrom, S. (1990). Predicting performance
of parallel computations. IEEE Trans. on Parallel &
Distributed Systems, 1(undefined):257–270.
Mustafa, S., Elghandour, I., and Ismail, M. A. (2018). A
machine learning approach for predicting execution
time of spark jobs. Alexandria Engineering Journal,
57(4):3767 – 3778.
Gray-Box Models for Performance Assessment of Spark Applications
617