6 CONCLUSION
In the paper at hand, we conducted an intensive re-
quirements engineering process that evaluated the
situation of holistic platforms for interactive data
science on distributed resources. As found out,
while such platforms are available for cloud, the on-
premises infrastructure is certainly missing in this as-
pect. The used solutions are rather primitive and
are lacking in terms of scalability, isolation and user-
friendliness.
Based on the data gathered during requirements
engineering process, we specified requirements, that
such a platform should fulfill. As next, we proposed
the architecture for such a platform. The proposed
solution offers high isolation of working users and
is scalable, supporting even large amount of users
working simultaneously. It provides users with in-
teractive notebooks with Spark integrated out-of-the-
box. Furthermore, the centralized Web-UI offers ker-
nel management, allowing users to create their own
environments in a user-friendly way and provides an
overview of the current cluster state.
The proposed solution was then evaluated. Firstly,
the specified requirements are fully satisfied. Further,
a usability evaluation was conducted, which provided
very high results, proving, the functionalities are in-
deed given and well implemented. Finally, a compar-
ison to a previously used solution was made, showing,
that the proposed platform outperforms it in terms of
scalability, isolation and user-friendliness.
As next step, a more dedicated user-management
system should be implemented, including an authen-
tication system like Kerberos
12
. The one used cur-
rently is rather simplified and offers only two groups
of users: a normal user and an administrator.
Despite the future steps stated above, the pro-
posed solution is already fully usable, providing a
user-friendly data science platform for data scien-
tists working with on-premises infrastructure. Since
the offered solution is available under the MIT open-
source license, it’s free to use for commercial pur-
poses and could even be further adapted to the on-
premises infrastructure at hand.
REFERENCES
Ahmed, N., Barczak, A. L., Susnjak, T., and Rashid, M. A.
(2020). A comprehensive performance analysis of
apache hadoop and apache spark for large scale data
sets using hibench. Journal of Big Data, 7(1):1–18.
12
https://tools.ietf.org/html/rfc4120#section-1.6
Bocchi, E., Castro, D., Gonzalez, H., Lamanna, M., Mato,
P., Moscicki, J., Piparo, D., and Tejedor, E. (2019).
Facilitating collaborative analysis in swan. EPJ Web
of Conferences, 214:07022.
Chrimes, D., Moa, B., Zamani, H., and Kuo, M.-
H. (2016). Interactive healthcare big data ana-
lytics platform under simulated performance. In
2016 IEEE 14th Intl Conf on Dependable, Au-
tonomic and Secure Computing, 14th Intl Conf
on Pervasive Intelligence and Computing, 2nd
Intl Conf on Big Data Intelligence and Comput-
ing and Cyber Science and Technology Congress
(DASC/PiCom/DataCom/CyberSciTech), pages 811–
818. IEEE.
Contu, R. and Pang, C. e. a. (2019). Forecast: Public Cloud
Services, Worldwide, 2017-2023, 3Q19 Update. Gart-
ner.
Dean, J. and Ghemawat, S. (2010). Mapreduce: a flexible
data processing tool. Communications of the ACM,
53(1):72–77.
Hosseini Shirvani, M., Rahmani, A. M., and Sahafi, A.
(2018). An iterative mathematical decision model for
cloud migration: A cost and security risk approach.
Software: Practice and Experience, 48(3):449–485.
McPadden, J., Durant, T. J., Bunch, D. R., Coppi, A., Price,
N., Rodgerson, K., Torre Jr, C. J., Byron, W., Hsiao,
A. L., Krumholz, H. M., et al. (2019). Health care
and precision medicine research: analysis of a scal-
able data science platform. Journal of medical Inter-
net research, 21(4):e13043.
Piparo, D., Tejedor, E., Mato, P., Mascetti, L., Moscicki,
J., and Lamanna, M. (2018). Swan: A service for
interactive analysis in the cloud. Future Generation
Computer Systems, 78:1071–1078.
Ramachandran, M. and Mahmood, Z. (2017). Require-
ments engineering for service and cloud computing.
Springer.
Schrepp, M., Hinderks, A., and Thomaschewski, J. (2014).
Applying the user experience questionnaire (ueq) in
different evaluation scenarios. In International Con-
ference of Design, User Experience, and Usability,
pages 383–392, Heraklion, Crete, Greece. Springer.
Vavilapalli, V. K., Murthy, A. C., Douglas, C., Agarwal,
S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah,
H., Seth, S., et al. (2013). Apache hadoop yarn: Yet
another resource negotiator. In Proceedings of the 4th
annual Symposium on Cloud Computing, pages 1–16.
Zaharia, M., Xin, R. S., Wendell, P., Das, T., Armbrust,
M., Dave, A., Meng, X., Rosen, J., Venkataraman, S.,
Franklin, M. J., Ghodsi, A., Gonzalez, J., Shenker,
S., and Stoica, I. (2016). Apache spark: A uni-
fied engine for big data processing. Commun. ACM,
59(11):56–65.
Zhang, Z. (2007). Effective requirements development-
a comparison of requirements elicitation techniques.
Software Quality Management XV: Software Quality
in the Knowledge Society, E. Berki, J. Nummenmaa, I.
Sunley, M. Ross and G. Staples (Ed.) British Computer
Society, pages 225–240.
CLOSER 2021 - 11th International Conference on Cloud Computing and Services Science
76