
Fleckenstein, M. and Fellows, L. (2018). Overview of
data management frameworks. In Modern Data Strat-
egy, pages 55–59. Springer International Publishing,
Cham.
Flink (2024). Apache flink project. https://flink.apache.org/
[Accessed: 2024].
Fowler, M. and Highsmith, J. (2000). The agile manifesto.
9.
Gorton, I., Bener, A. B., and Mockus, A. (2016). Soft-
ware engineering for big data systems. IEEE Software,
33(2):32–35.
Gorton, I. and Klein, J. (2015). Distribution, data, deploy-
ment: Software architecture convergence in big data
systems. IEEE Software, 32(3):78–85.
Grady, N. W., Payne, J. A., and Parker, H. (2017). Agile
big data analytics: Analyticsops for data science. In
2017 IEEE International Conference on Big Data (Big
Data), pages 2331–2339.
Grolinger, K., Higashino, W. A., Tiwari, A., and Capretz,
M. A. (2013). Data management in cloud environ-
ments: Nosql and newsql data stores. Journal of
Cloud Computing: Advances, Systems and Applica-
tions, 2(1).
Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S.,
Gani, A., and Ullah Khan, S. (2015). The rise of “big
data” on cloud computing: Review and open research
issues. Information Systems, 47:98–115.
Hoda, R., Salleh, N., and Grundy, J. (2018). The rise and
evolution of agile software development. IEEE Soft-
ware, 35(5):58–63.
HPCC (2024). Hpcc systems. https://www.hpccsystems.
com [Accessed: 2024].
Hummel, O., Eichelberger, H., Giloj, A., Werle, D., and
Schmid, K. (2018). A collection of software engi-
neering challenges for big data system development.
In 2018 44th Euromicro Conference on Software En-
gineering and Advanced Applications (SEAA), pages
362–369.
Irizarry, R. A. (2020). The Role of Academia in Data Sci-
ence Education. Harvard Data Science Review, 2(1).
https://hdsr.mitpress.mit.edu/pub/gg6swfqh.
J. Gao, A. K. and Selle, S. (2015). Towards a process
view on critical success factors in big data analytics
projects. Core.ac.uk.
Kallio, H., Pietil
¨
a, A.-M., Johnson, M., and Kangas-
niemi, M. (2016). Systematic methodological re-
view: developing a framework for a qualitative semi-
structured interview guide. Journal of Advanced Nurs-
ing, 72(12):2954–2965.
Laigner, R., Kalinowski, M., Lifschitz, S., Salvador Mon-
teiro, R., and de Oliveira, D. (2018). A systematic
mapping of software engineering approaches to de-
velop big data systems. In 2018 44th Euromicro Con-
ference on Software Engineering and Advanced Ap-
plications (SEAA), pages 446–453.
Marz, N. and Warren, J. (2015). Big Data: Principles and
best practices of scalable realtime data systems. Man-
ning Publications Co., USA, 1st edition.
Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkatara-
man, S., Liu, D., Freeman, J., Tsai, D., Amde, M.,
Owen, S., Xin, D., Xin, R., Franklin, M. J., Zadeh,
R., Zaharia, M., and Talwalkar, A. (2016). Mllib: ma-
chine learning in apache spark. J. Mach. Learn. Res.,
17(1):1235–1241.
Microsoft (2024). What is the team data sci-
ence process? - azure architecture center.
Available at: https://learn.microsoft.com/en-
us/azure/architecture/data-science-process/overview.
Nagashima, H. and Kato, Y. (2019). Aprep-dm: a frame-
work for automating the pre-processing of a sensor
data analysis based on crisp-dm. In 2019 IEEE In-
ternational Conference on Pervasive Computing and
Communications Workshops (PerCom Workshops),
pages 555–560.
R. Basili, G. C. and Rombach, H. D. (1994). The goal ques-
tion metric approach. Encyclopedia of Software Engi-
neering, 1:528–532.
Saltz, J., Sutherland, A., and Hotz, N. (2022). Achieving
lean data science agility via data driven scrum.
Saltz, J. S. and Krasteva, I. (2022). Current approaches for
executing big data science projects - a systematic lit-
erature review. PeerJ Computer Science, 8.
Saltz, J. S. and Shamshurin, I. (2016). Big data team pro-
cess methodologies: A literature review and the iden-
tification of key factors for a project’s success. In
2016 IEEE International Conference on Big Data (Big
Data), pages 2872–2879.
Schr
¨
oer, C., Kruse, F., and G
´
omez, J. M. (2021). A sys-
tematic literature review on applying crisp-dm pro-
cess model. Procedia Computer Science, 181:526–
534. CENTERIS 2020 - International Conference on
ENTERprise Information Systems / ProjMAN 2020
- International Conference on Project MANagement /
HCist 2020 - International Conference on Health and
Social Care Information Systems and Technologies
2020, CENTERIS/ProjMAN/HCist 2020.
Schwaber, K. and Beedle, M. (2002). Agile Software De-
velopment with Scrum. Prentice Hall, Upper Saddle
River, New Jersey.
Sharma, S., Kumar, D., and Fayad, M. (2021). An impact
assessment of agile ceremonies on sprint velocity un-
der agile software development. In 2021 9th Interna-
tional Conference on Reliability, Infocom Technolo-
gies and Optimization (Trends and Future Directions)
(ICRITO), pages 1–5.
Shvachko, K., Kuang, H., Radia, S., and Chansler, R.
(2010). The hadoop distributed file system. In 2010
IEEE 26th Symposium on Mass Storage Systems and
Technologies (MSST), pages 1–10.
Spark (2024). Apache spark project. https://spark.apache.
org/ [Accessed: 2024].
Sterling, T. L., Savarese, D., Becker, D. J., Dorband, J. E.,
Ranawake, U. A., and Packer, C. V. (1995). Beowulf:
A parallel workstation for scientific computation. In
International Conference on Parallel Processing.
Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S.,
and Stoica, I. (2010). Spark: cluster computing with
working sets. In Proceedings of the 2nd USENIX
Conference on Hot Topics in Cloud Computing, Hot-
Cloud’10, page 10, USA. USENIX Association.
Zhu, Y. and Xiong, Y. (2015). Defining data science.
IoTBDS 2025 - 10th International Conference on Internet of Things, Big Data and Security
104