fixed input size, forcing us to restrict the size of mu-
tated inputs. There is space for experimentation to
overcome this issue by using different NN architec-
tures. Probably recurrent NNs are a suitable choice,
but it faces the challenge of modeling a variable-
length policy.
An important component of Rainfuzz is the set of
position-specific mutations (Section 4.2) correspond-
ing to a single action. The mutations we use are
inspired by random-position mutations that are used
within AFL++; it might be interesting to experiment
with different sets of position-specific mutations and
study how they influence the performance of fuzzing
based on the input format of the PUT.
8 CONCLUSIONS
In this paper, we propose an innovative fuzzing ap-
proach that builds heat-maps using reinforcement
learning, aiding the mutation strategy and overcom-
ing the issue of alternating training phases to fuzzing
phases. We implemented our approach by means of
Rainfuzz, and we tuned it by trying different con-
figurations (RQ2, RQ3). We tested the validity of
our approach (RQ1) by comparing Rainfuzz against
an equivalent fuzzer that uses a fully random policy,
showing that Rainfuzz performs better both in terms
of average reward per action and in terms of edge-
coverage. We tested Rainfuzz against a state-of-the-
art fuzzer (AFL++), with poor results (RQ5); but we
showed that Rainfuzz and AFL++ running in a col-
laborative fuzzing setting obtain the best performance
(RQ6). We confirmed the robustness of Rainfuzz by
showing that the previous results still apply if the PUT
changes (RQ7). Finally, we concluded by providing
some ideas to extend and improve the approach we
proposed.
ACKNOWLEDGEMENTS
Lorenzo Binosi acknowledges support from TIM
S.p.A. through the PhD scholarship.
REFERENCES
Barth-Maron, G., Hoffman, M. W., Budden, D., Dabney,
W., Horgan, D., TB, D., Muldal, A., Heess, N., and
Lillicrap, T. P. (2018). Distributed distributional de-
terministic policy gradients. CoRR, abs/1804.08617.
B
¨
ohme, M., Pham, V., and Roychoudhury, A. (2016).
Coverage-based greybox fuzzing as markov chain. In
Weippl, E. R., Katzenbeisser, S., Kruegel, C., Myers,
A. C., and Halevi, S., editors, Proceedings of the 2016
ACM SIGSAC Conference on Computer and Commu-
nications Security, Vienna, Austria, October 24-28,
2016, pages 1032–1043. ACM.
B
¨
ottinger, K., Godefroid, P., and Singh, R. (2018). Deep
reinforcement fuzzing. In 2018 IEEE Security and
Privacy Workshops, SP Workshops 2018, San Fran-
cisco, CA, USA, May 24, 2018, pages 116–122. IEEE
Computer Society.
Chen, C., Cui, B., Ma, J., Wu, R., Guo, J., and Liu, W.
(2018). A systematic review of fuzzing techniques.
Comput. Secur., 75:118–137.
Fioraldi, A., Maier, D., Eißfeldt, H., and Heuse, M. (2020).
AFL++ : Combining incremental steps of fuzzing re-
search. In Yarom, Y. and Zennou, S., editors, 14th
USENIX Workshop on Offensive Technologies, WOOT
2020, August 11, 2020. USENIX Association.
Google (2016). HonggFuzz. https://honggfuzz.dev/.
G
¨
uler, E., G
¨
orz, P., Geretto, E., Jemmett, A.,
¨
Osterlund, S.,
Bos, H., Giuffrida, C., and Holz, T. (2020). Cupid :
Automatic fuzzer selection for collaborative fuzzing.
In ACSAC ’20: Annual Computer Security Applica-
tions Conference, Virtual Event / Austin, TX, USA, 7-
11 December, 2020, pages 360–372. ACM.
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T.,
Tassa, Y., Silver, D., and Wierstra, D. (2016). Con-
tinuous control with deep reinforcement learning. In
Bengio, Y. and LeCun, Y., editors, 4th International
Conference on Learning Representations, ICLR 2016,
San Juan, Puerto Rico, May 2-4, 2016, Conference
Track Proceedings.
LLVM (2017). libFuzzer. http://llvm.org/docs/LibFuzzer.
html.
Man
`
es, V. J. M., Han, H., Han, C., Cha, S. K., Egele, M.,
Schwartz, E. J., and Woo, M. (2021). The art, science,
and engineering of fuzzing: A survey. IEEE Trans.
Software Eng., 47(11):2312–2331.
Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap,
T. P., Harley, T., Silver, D., and Kavukcuoglu, K.
(2016). Asynchronous methods for deep reinforce-
ment learning. CoRR, abs/1602.01783.
Rajpal, M., Blum, W., and Singh, R. (2017). Not all
bytes are equal: Neural byte sieve for fuzzing. CoRR,
abs/1711.04596.
Schulman, J., Levine, S., Moritz, P., Jordan, M. I., and
Abbeel, P. (2015). Trust region policy optimization.
CoRR, abs/1502.05477.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and
Klimov, O. (2017). Proximal policy optimization al-
gorithms. CoRR, abs/1707.06347.
She, D., Pei, K., Epstein, D., Yang, J., Ray, B., and Jana,
S. (2019). NEUZZ: efficient fuzzing with neural pro-
gram smoothing. In 2019 IEEE Symposium on Secu-
rity and Privacy, SP 2019, San Francisco, CA, USA,
May 19-23, 2019, pages 803–817. IEEE.
Wang, J., Duan, Y., Song, W., Yin, H., and Song,
C. (2019a). Be sensitive and collaborative: An-
alyzing impact of coverage metrics in greybox
Rainfuzz: Reinforcement-Learning Driven Heat-Maps for Boosting Coverage-Guided Fuzzing
49