high as 95% for one repository (#14). Multilingual
analysis on average improves taint coverage by 76%
– showing that it opens up a significant extra amount
of suspicious code for inspection.
6 CONCLUSIONS
This research has addressed the issue of taint analy-
sis in multilingual software systems. Although it is
increasingly common for software designers to select
different languages for different components of a sys-
tem (JavaScript for the front end, C/C++ for numer-
ical processing, etc.) taint analysis tools are primar-
ily language dependent. In contrast to other other re-
search on multilingual static analysis, our approach
does not privilege any language in the codebase above
any other. Our proposed solution to extending taint
analysis to multilingual systems includes monolin-
gual analysis and cross-language foreign function in-
terfaces (FFIs) as modular components.
The novel contributions of this paper include
the proposed Multilingual Taint Analysis algorithm
(MTA), an analysis of its complexity and a detailed
example, and performance results from our initial im-
plementation of MTA on a codebase of 20 reposito-
ries totalling 1872 lines of code. A coverage improve-
ment metric is introduced that reveals how much more
source code is opened to inspection for taint using
MTA. On average, coverage improvement was 76%
for MTA over monolingual taint analysis. The aver-
age repository size in this study was relatively small,
and this allowed detailed manual checking. However,
even with these small repositories, the improvement
in coverage is significant. While we believe our re-
sults will transfer to larger repositories, that impor-
tant step is left to future work. Furthermore, the re-
sults hold for the C++/Python FFIs covered by the
implementation. While we argue that other FFIs can
be added with no change to our algorithm design or
complexity calculations, supporting evidence for this
is also a matter of future work.
The decision to concentrate on C/C++ and Python
was made based on the trend to embed C/C++ func-
tions in Python for speed of numerical processing.
While MTA clearly adds functionality in this case,
we believe that it holds even greater promise for
analysis of web repositories which include HTML
and JavaScript front ends communicating with back-
ends that may be built in Python. Both Python and
JavaScript leverage libraries that are written in other
languages. In future work we plan to address this
problem using the framework developed here.
ACKNOWLEDGEMENTS
The authors wish to acknowledge the contributions of
graduate students Kevin Johns, Courtney King, Ar-
lind Stafaj, Taylor Termine and Benjamin Vecchio.
REFERENCES
Alashjaee, A. M., Duraibi, S., and Song, J. (2019). Dynamic
taint analysis tools: A review. International Journal
of Computer Science and Security, 13.
Arzt, S. (2014). Flowdroid: Precise context, flow, field,
object-sensitive and lifecycle-aware taint analysis for
android apps. ACM SIGPLAN Conf. on Prog. lang.
Des. and Impl.
Boxler, D. and Walcott, K. (2018). Static taint analysis tools
to detect information flows. Int. Conf. Soft. Eng. Re-
search & Practice.
Furr, M. and Foster, J. (2005). Checking type safety of for-
eign function calls. ACM SIGPLAN Conf. on Prog.
Lan. Des. & Imp.
Grimmer, M., Schatz, R., Seaton, C., Wurthinger, T., and
Lujan, M. (2018). Cross-language interoperability in
a multi-language runtime. ACM Trans. Prog. Lan. &
Sys., 40(2).
Kreindl, J., Bonetta, D., and M
¨
ossenb
¨
ock, H. (2019). To-
wards efficient, multi-language dynamic taint analy-
sis. MPLR 2019, page 85–94, New York, NY, USA.
Association for Computing Machinery.
Kreindl, J., Bonetta, D., Stadler, L., Leopoldseder, D., and
M
¨
ossenb
¨
ock, H. (2020). Multi-language dynamic
taint analysis in a polyglot virtual machine. MPLR
2020, page 15–29, New York, NY, USA. Association
for Computing Machinery.
Lee, S., Dolby, J., and Ryu, S. (2016). Hybridroid: static
analysis framework for android hybrid applications.
31st IEEE/ACM Int. Conf. on Aut. Soft. Eng.
Lee, S., Lee, H., and Ryu, S. (2020). Broadening horizons
of multilingual static analysis: Semantic summary ex-
traction from c code for jni program analysis. 35th
IEEE/ACM International Conference on Automated
Software Engineering.
Lyons, D., Bogar, A.-M., and Baird, D. (2018). Lightweight
call-graph construction for multilingual software anal-
ysis. 13th Int. Conf. on Software Technologies.
Lyons, D. and Zahra, S. (2020). Using taint analysis and re-
inforcement learning (tarl) to repair autonomous robot
software. IEEE Workshop on Assrd. Aut. Sys.
Lyons, D., Zahra, S., and Marshall, T. (2019). Towards
lakosian multilingual software design principles. 4th
Int. Conf. on Software Technologies.
Madsen, M., Livshits, B., and Fanning, M. (2013). Practical
static analysis of javascript applications in the pres-
ence of frameworks and libraries. 9th Joint Meeting
on Foundations of Software Engineering.
Mayer, P., Kirsch, M., and Le, M.-A. (2017). On multi-
language software development, cross-language links
ICSOFT 2021 - 16th International Conference on Software Technologies
76