5.1 Future Work
Since our objective is to identify all relevant process-
ing of personal data in source code, reducing false
negatives is our next primary priority. However, in
our case, false positives are not a major concern. Due
to the subtlety of personal data processing, determin-
ing relevance without human assistance is particularly
challenging. Specifying the analysis to certain spe-
cific patterns would ease manual analysis. This ne-
cessitates the implementation of a privacy taxonomy.
Using Ethyca’s taxonomy (Ethyca, 2022) as an exam-
ple, we may modify our labels to match the technique
with the taxonomy.
As an extension of this article, we propose an au-
tomated mapping of personal data in an unpublished
(under review) manuscript (Tang et al., 2023) to assist
developers and code reviewers in identifying privacy-
related code. The mapping based on static analysis
automatically detects personal data and the code that
processes it, and we offer semantics of personal data
This short paper presented ongoing work on a novel,
customizable approach to identify personal data pro-
cessing for code review. This three-phase technique
first uses Semgrep to match patterns in the code based
on rules for sources and sinks, then associates code
snippets generated from pattern matching with a set of
behavioral labels, and finally groups results to reduce
code reviewer workload. Our demonstration shows
the utility and feasibility of this method for gathering
and presenting code snippets related to personal data
processing from a codebase.
Along with the continued development of the ap-
proach architecture (refined rules for source and sink,
more meaningful labels, and additional criteria for
grouping), future work will focus on expanding the
case study to include a larger set of open-source soft-
ware from various domains and conducting a thor-
ough user evaluation of the resulting platform.
This work is part of the Privacy Matters (PriMa)
project. The PriMa project has received funding from
European Union’s Horizon 2020 research and inno-
vation program under the Marie Skłodowska-Curie
grant agreement No. 860315.
