tect new and unknown side-channels. We are also
interested in exploring IL detection when the gener-
ated dataset yields a multi-class classification prob-
lem with ≥ 3 classes. This necessitates extending the
presented FET approach to account for multi-class
classification problems. Finally, we would like to
provide appropriate theoretical backing for our ap-
proaches using information theory.
ACKNOWLEDGEMENTS
We would like to thank Bj
¨
orn Haddenhorst, Karlson
Pfanschmidt, Vitalik Melnikov, and the anonymous
reviewers for their valuable and helpful suggestions.
This work is supported by the Bundesministerium f
¨
ur
Bildung und Forschung (BMBF) under the project
16KIS1190 (AutoSCA) and funded by European Re-
search Council (ERC)-802823.
REFERENCES
Bengio, Y. and Grandvalet, Y. (2004). No unbiased estima-
tor of the variance of k-fold cross-validation. Journal
of Machine Learning Research, 5:1089–1105.
Bhattacharya, B. and Habtzghi, D. (2002). Median of the p
value under the alternative hypothesis. The American
Statistician, 56(3):202–206.
Camilli, G. (1995). The relationship between fisher’s exact
test and pearson’s chi-square test: A bayesian perspec-
tive. Psychometrika, 60(2):305–312.
Chatzikokolakis, K., Chothia, T., and Guha, A. (2010). Sta-
tistical measurement of information leakage. In Es-
parza, J. and Majumdar, R., editors, Tools and Algo-
rithms for the Construction and Analysis of Systems,
pages 390–404, Berlin, Heidelberg. Springer Berlin
Heidelberg.
Chicco, D., T
¨
otsch, N., and Jurman, G. (2021). The
matthews correlation coefficient (mcc) is more reli-
able than balanced accuracy, bookmaker informed-
ness, and markedness in two-class confusion matrix
evaluation. BioData Mining, 14(1):13.
Cybenko, G. (1989). Approximation by superpositions of a
sigmoidal function. Mathematics of Control, Signals
and Systems, 2(4):303–314.
Dem
ˇ
sar, J. (2006). Statistical comparisons of classifiers
over multiple data sets. Journal of Machine Learning
Research, 7(1):1–30.
Drees, J. P., Gupta, P., H
¨
ullermeier, E., Jager, T., Konze, A.,
Priesterjahn, C., Ramaswamy, A., and Somorovsky, J.
(2021). Automated detection of side channels in cryp-
tographic protocols: Drown the robots! In Proceed-
ings of the 14th ACM Workshop on Artificial Intelli-
gence and Security, AISec ’21, page 169–180, New
York, NY, USA. Association for Computing Machin-
ery.
Fisher, R. A. (1922). On the interpretation of χ
2
from con-
tingency tables, and the calculation of P. Journal of
the Royal Statistical Society, 85(1):87–94.
Geurts, P., Ernst, D., and Wehenkel, L. (2006). Extremely
randomized trees. Machine learning, 63(1):3–42.
Hashemi, M. and Karimi, H. (2018). Weighted machine
learning. Statistics, Optimization and Information
Computing, 6(4):497–525.
Head, T., Kumar, M., Nahrstaedt, H., Louppe, G.,
and Shcherbatyi, I. (2021). scikit-optimize/scikit-
optimize.
Hettwer, B., Gehrer, S., and G
¨
uneysu, T. (2020). Applica-
tions of machine learning techniques in side-channel
attacks: a survey. Journal of Cryptographic Engineer-
ing, 10(2):135–162.
Holm, S. (1979). A simple sequentially rejective multi-
ple test procedure. Scandinavian Journal of Statistics,
6(2):65–70.
Kl
´
ıma, V., Pokorn
´
y, O., and Rosa, T. (2003). Attacking
rsa-based sessions in ssl/tls. In Walter, C. D., Koc¸,
C¸ . K., and Paar, C., editors, Cryptographic Hardware
and Embedded Systems - CHES 2003, pages 426–440,
Berlin, Heidelberg. Springer Berlin Heidelberg.
Kotsiantis, S. B., Zaharakis, I. D., and Pintelas, P. E.
(2006). Machine learning: a review of classification
and combining techniques. Artificial Intelligence Re-
view, 26(3):159–190.
Koyejo, O., Ravikumar, P., Natarajan, N., and Dhillon,
I. S. (2015). Consistent multilabel classification. In
Proceedings of the 28th International Conference on
Neural Information Processing Systems - Volume 2,
NIPS’15, page 3321–3329, Cambridge, MA, USA.
MIT Press.
Moos, T., Wegener, F., and Moradi, A. (2021). DL-
LA: Deep Learning Leakage Assessment: A mod-
ern roadmap for SCA evaluations. IACR Transactions
on Cryptographic Hardware and Embedded Systems,
2021(3):552–598.
Mushtaq, M., Akram, A., Bhatti, M. K., Chaudhry, M.,
Lapotre, V., and Gogniat, G. (2018). Nights-watch:
A cache-based side-channel intrusion detector using
hardware performance counters. In Proceedings of the
7th International Workshop on Hardware and Archi-
tectural Support for Security and Privacy, HASP ’18,
New York, NY, USA. Association for Computing Ma-
chinery.
Nadeau, C. and Bengio, Y. (2003). Inference for the gener-
alization error. Machine Learning, 52(3):239–281.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,
Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P.,
Weiss, R., et al. (2011). Scikit-learn: Machine learn-
ing in Python. Journal of Machine Learning Research,
12:2825–2830.
Perianin, T., Carr
´
e, S., Dyseryn, V., Facon, A., and Guilley,
S. (2021). End-to-end automated cache-timing attack
driven by machine learning. Journal of Cryptographic
Engineering, 11(2):135–146.
Picek, S., Heuser, A., Jovic, A., Bhasin, S., and Regazzoni,
F. (2018). The curse of class imbalance and conflicting
Automated Information Leakage Detection: A New Method Combining Machine Learning and Hypothesis Testing with an Application to
Side-channel Detection in Cryptographic Protocols
161