for BEK (Hooimeijer et al., 2011) programs such as
equivalence, idempotency and commutativity, which
are included in our tool as well.
BEK (Hooimeijer et al., 2011) is a language that
can be used to develop sanitizers and analyse their
correctness. However, this cannot be used to reason
about the correctness of existing sanitizers without re-
implementing them.
Botin
ˇ
can and Babi
´
c (Botin
ˇ
can and Babi
´
c, 2013)
present a technique Sigma* that learns symbolic look-
back transducers from programs. This model can
represent more sanitizers than the SFTs that we use.
However, they use a white-box learning technique,
meaning that they need access to the source code
whereas we only need to be able to observe the input
and output of the program. Extending the algorithm
that we present in this paper to symbolic lookback
transducers that Botin
ˇ
can and Babi
´
c use is a topic for
future work.
There exist several other methods to reason about
sanitizers’ correctness most of which focus on detect-
ing vulnerabilities (Balzarotti et al., 2008; Moham-
madi et al., 2015; Shar and Tan, 2012). Our approach
can be used to detect vulnerabilities similar to these
methods. However, we are also able to reason about
their input-output behaviour in terms of, e.g. idempo-
tency and commutativity.
Aside from correct implementation of sanitizers,
the placement of sanitizers also influences the correct-
ness of an application. If sanitizers are not placed
correctly then applications may still be vulnerable.
Several researchers have therefore focused on either
repairing the placement of sanitizers, or automati-
cally placing sanitizers (Saxena et al., 2011; Weleare-
gai and Hammer, 2017; Yu et al., 2011). These ap-
proaches are considered complementary research to
the ideas discussed in this paper.
Aside from sanitization, there are also
sanitization-free defences. For example, Scholte
et al. (Scholte et al., 2012) show that automatically
validating input can be a good alternative to output
sanitization for preventing XSS and SQL injection
vulnerabilities. Similarly, Costa et al. (Costa et al.,
2007) have presented the tool Bouncer which pre-
vents exploitation of software by generating input
filters that drop dangerous inputs.
8 CONCLUSION AND FUTURE
WORK
To conclude, we have presented a new approach to
reason about the correctness of sanitizers. First of
all, we developed a new learning algorithm, which
uses equivalence and membership queries, to auto-
matically derive SFTs of existing sanitizers. This au-
tomaton describes how the sanitizer transforms an in-
put into its corresponding output. Then, we wrote a
specification of the sanitizer, in the form of an SFA
or SFT. This specification is compared to the learned
model of the sanitizer in order to find any discrepan-
cies between the models. With a case study, we have
shown that we can use our approach to automatically
reason about real-world existing sanitizers within a
few minutes.
As future research, we think that extending the
learning algorithm to support epsilon transitions and
SFTs with lookahead, lookback or registers is most
important. This would allow us to reason about more
complex sanitizers. One can also look into improving
the user experience of the approach by letting users
write the specifications in ways that are more familiar
to them such that they do not need to understand how
SFTs work. Another option is to present users with
a minimised graphical representation of the learned
models for manual correctness inspection.
REFERENCES
Angluin, D. (1987). Learning regular sets from queries
and counterexamples. Information and computation,
75(2):87–106.
Argyros, G., Stais, I., Kiayias, A., and Keromytis, A. D.
(2016). Back in black: towards formal, black box
analysis of sanitizers and filters. In 2016 IEEE Sym-
posium on Security and Privacy, pages 91–109. IEEE.
Balzarotti, D., Cova, M., Felmetsger, V., Jovanovic, N.,
Kirda, E., Kruegel, C., and Vigna, G. (2008). Saner:
Composing static and dynamic analysis to validate
sanitization in web applications. In 2008 IEEE Sym-
posium on Security and Privacy, pages 387–401.
IEEE.
Bjørner, N. and Veanes, M. (2011). Symbolic transduc-
ers. Technical Report MSR-TR-2011-3, Microsoft
Research.
Bohlin, T. and Jonsson, B. (2008). Regular inference
for communication protocol entities. Technical re-
port, Technical Report 2008-024, Uppsala University,
Computer Systems.
Botin
ˇ
can, M. and Babi
´
c, D. (2013). Sigma*: symbolic
learning of input-output specifications. In ACM SIG-
PLAN Notices, volume 48, pages 443–456. ACM.
Bynens, M. (2018). he. https://github.com/mathiasbynens/
he. Accessed on: 19-12-2019.
Cassel, S., Howar, F., Jonsson, B., and Steffen, B. (2014).
Learning extended finite state machines. In Interna-
tional Conference on Software Engineering and For-
mal Methods, pages 250–264. Springer.
Costa, M., Castro, M., Zhou, L., Zhang, L., and Peinado, M.
(2007). Bouncer: Securing software by blocking bad
ForSE 2020 - 4th International Workshop on FORmal methods for Security Engineering
794