5.1 Future Work
Since our objective is to identify all relevant process-
ing of personal data in source code, reducing false
negatives is our next primary priority. However, in
our case, false positives are not a major concern. Due
to the subtlety of personal data processing, determin-
ing relevance without human assistance is particularly
challenging. Specifying the analysis to certain spe-
cific patterns would ease manual analysis. This ne-
cessitates the implementation of a privacy taxonomy.
Using Ethyca’s taxonomy (Ethyca, 2022) as an exam-
ple, we may modify our labels to match the technique
with the taxonomy.
As an extension of this article, we propose an au-
tomated mapping of personal data in an unpublished
(under review) manuscript (Tang et al., 2023) to assist
developers and code reviewers in identifying privacy-
related code. The mapping based on static analysis
automatically detects personal data and the code that
processes it, and we offer semantics of personal data
flows.
6 CONCLUSIONS
This short paper presented ongoing work on a novel,
customizable approach to identify personal data pro-
cessing for code review. This three-phase technique
first uses Semgrep to match patterns in the code based
on rules for sources and sinks, then associates code
snippets generated from pattern matching with a set of
behavioral labels, and finally groups results to reduce
code reviewer workload. Our demonstration shows
the utility and feasibility of this method for gathering
and presenting code snippets related to personal data
processing from a codebase.
Along with the continued development of the ap-
proach architecture (refined rules for source and sink,
more meaningful labels, and additional criteria for
grouping), future work will focus on expanding the
case study to include a larger set of open-source soft-
ware from various domains and conducting a thor-
ough user evaluation of the resulting platform.
ACKNOWLEDGEMENTS
This work is part of the Privacy Matters (PriMa)
project. The PriMa project has received funding from
European Union’s Horizon 2020 research and inno-
vation program under the Marie Skłodowska-Curie
grant agreement No. 860315.
REFERENCES
Bambauer, D. E. (2013). Privacy versus security. J. Crim.
L. & Criminology, 103:667.
Ber
ˇ
ci
ˇ
c, B. and George, C. (2009). Identifying personal data
using relational database design principles. Interna-
tional Journal of Law and Information Technology,
17(3):233–251.
Bertino, E. (2016). Data security and privacy: Concepts, ap-
proaches, and research directions. In 2016 IEEE 40th
Annual Computer Software and Applications Confer-
ence (COMPSAC), volume 1, pages 400–407. IEEE.
Blume, P. (2016). Impact of the EU General Data Protec-
tion Regulation on the public sector. Journal of Data
Protection & Privacy, 1(1):53–63.
Braz, L. and Bacchelli, A. (2022). Software security dur-
ing modern code review: The developer’s perspective.
arXiv preprint arXiv:2208.04261.
Buse, R. P. and Zimmermann, T. (2012). Information
needs for software development analytics. In 2012
34th International Conference on Software Engineer-
ing (ICSE), pages 987–996. IEEE.
Ethyca (2022). Fides language. https://ethyca.github.io/
fideslang/. (Accessed on 11/15/2022).
Finck, M. and Pallas, F. (2020). They who must
not be identified—distinguishing personal from non-
personal data under the GDPR. International Data
Privacy Law, 10(1):11–36.
Fugkeaw, S., Chaturasrivilai, A., Tasungnoen, P., and
Techaudomthaworn, W. (2021). AP2I: Adaptive PII
scanning and consent discovery system. In 2021 13th
International Conference on Knowledge and Smart
Technology (KST), pages 231–236. IEEE.
Hadar, I., Hasson, T., Ayalon, O., Toch, E., Birnhack, M.,
Sherman, S., and Balissa, A. (2018). Privacy by de-
signers: software developers’ privacy mindset. Em-
pirical Software Engineering, 23(1):259–289.
Jain, P., Gyanchandani, M., and Khare, N. (2016). Big data
privacy: a technological perspective and review. Jour-
nal of Big Data, 3(1):1–25.
Johnson, B., Song, Y., Murphy-Hill, E., and Bowdidge, R.
(2013). Why don’t software developers use static anal-
ysis tools to find bugs? In 2013 35th International
Conference on Software Engineering (ICSE), pages
672–681. IEEE.
Lenhard, J., Fritsch, L., and Herold, S. (2017). A literature
study on privacy patterns research. In 2017 43rd Eu-
romicro Conference on Software Engineering and Ad-
vanced Applications (SEAA), pages 194–201. IEEE.
McGraw, G. (2008). Automated code review tools for secu-
rity. Computer, 41(12):108–111.
McIntosh, S., Kamei, Y., Adams, B., and Hassan, A. E.
(2014). The impact of code review coverage and code
review participation on software quality: A case study
of the qt, vtk, and itk projects. In Proceedings of the
11th working conference on mining software reposito-
ries, pages 192–201.
Notario, N., Crespo, A., Mart
´
ın, Y.-S., Del Alamo, J. M.,
Le M
´
etayer, D., Antignac, T., Kung, A., Kroener, I.,
and Wright, D. (2015). PRIPARE: integrating privacy
ICISSP 2023 - 9th International Conference on Information Systems Security and Privacy
574