
vacy concerns forced us to abandon the LOGGIT soft-
ware which was a cornerstone in our pipeline archi-
tecture. We did our best to comply with this require-
ment as soon as possible.
The aim of our contribution is to provide a real-
world scenario or use case which might be useful for
other companies when dealing with similar issues.
We iteratively refactored our pipeline architecture
several times in order to find a balance between our
requirements in terms of reliability and the sometimes
convoluted rules of the Data Protection Authority.
We discussed each iteration and the choices we
made. Our pipeline has been up and running since
the first iteration and we had no data loss due to the
switch.
While we are still missing the visualization feature
of the previous tool (LOGGIT), we managed to provide
a basic monitoring facility. In order restore the pre-
vious web interface facility, our basic plan is to index
the data stream into an Elastic stack (e.g., OpenSearch
+ OpenDashboard) and replicate the LOGGIT visual-
izations. We consider the addition of a monitoring
and visualization interface as our next future goal.
REFERENCES
(2022). LepidaScpA Home Page.
(2023). Checkmk - An All-in-One, open source IT moni-
toring solution.
(2023). Data protection in the EU.
(2023). The FNV Non-Cryptographic Hash Algorithm.
AlOmar, E. A., AlRubaye, H., Mkaouer, M. W., Ouni, A.,
and Kessentini, M. (2021). Refactoring practices in
the context of modern code review: An industrial case
study at xerox. In 2021 IEEE/ACM 43rd International
Conference on Software Engineering: Software Engi-
neering in Practice (ICSE-SEIP), pages 348–357.
Armbrust, M., Xin, R. S., Lian, C., Huai, Y., Liu, D.,
Bradley, J. K., Meng, X., Kaftan, T., Franklin, M. J.,
Ghodsi, A., and Zaharia, M. (2015). Spark SQL: Re-
lational Data Processing in Spark. In Proceedings of
the 33rd ACM SIGMOD International Conference on
Management of Data, pages 1383–1394. ACM.
Camacho-Rodr
´
ıguez, J., Chauhan, A., Gates, A., Koifman,
E., O’Malley, O., Garg, V., Haindrich, Z., Shelukhin,
S., Jayachandran, P., Seth, S., Jaiswal, D., Bouguerra,
S., Bangarwa, N., Hariappan, S., Agarwal, A., Dere,
J., Dai, D., Nair, T., Dembla, N., Vijayaraghavan, G.,
and Hagleitner, G. (2019). Apache hive: From mapre-
duce to enterprise-grade big data warehousing.
Fuller, M., Moser, M., and Traverso, M. (2022). Trino: The
Definitive Guide, 2nd Edition. O’Reilly Media, Inc.
Hiraman, B. R., Viresh M., C., and Abhijeet C., K. (2018).
A study of apache kafka in big data stream process-
ing. In 2018 International Conference on Information
, Communication, Engineering and Technology (ICI-
CET), pages 1–3.
Mitra, M. and Sy, D. (2016). The rise of elastic stack.
Peruma, A., Simmons, S., Alomar, E. A., Newman, C. D.,
Mkaouer, M. W., and Ouni, A. (2021). How do i refac-
tor this? an empirical study on refactoring trends and
topics in stack overflow. Empirical Software Engi-
neering, 27.
Sadeghi-Nasab, A. and Rafe, V. (2022). A comprehen-
sive review of the security flaws of hashing algo-
rithms. Journal of Computer Virology and Hacking
Techniques, 19:1–16.
Shaikh, E., Mohiuddin, I., Alufaisan, Y., and Nahvi, I.
(2019). Apache spark: A big data processing en-
gine. In 2019 2nd IEEE Middle East and North Africa
COMMunications Conference (MENACOMM), pages
1–6.
Zamfir, A.-V., Carabas, M., Carabas, C., and Tapus, N.
(2019). Systems monitoring and big data analysis us-
ing the elasticsearch system. pages 188–193.
Zhuoyu, H. and Yongzhen, L. (2022). Design and imple-
mentation of efficient hash functions. In 2022 IEEE
2nd International Conference on Power, Electronics
and Computer Applications (ICPECA), pages 1240–
1243.
What to Do when Privacy Issues Screw It Up: Ingestion Refactoring in a Big-Data Pipeline
215