
Gopinath, D., Katz, G., Pasareanu, C. S., and Barrett, C. W.
(2018). DeepSafe: A data-driven approach for assess-
ing robustness of neural networks. In ATVA, pages
3–19.
Gorji, N. and Rubin, S. (2022). Sufficient reasons for clas-
sifier decisions in the presence of domain constraints.
In AAAI, pages 5660–5667.
Han, S., Lin, C., Shen, C., Wang, Q., and Guan, X. (2023).
Interpreting adversarial examples in deep learning: A
review. ACM Computing Surveys.
Hein, M. and Andriushchenko, M. (2017). Formal guaran-
tees on the robustness of a classifier against adversar-
ial manipulation. In NeurIPS, pages 2266–2276.
Hendrycks, D., Basart, S., Mu, N., Kadavath, S., Wang, F.,
Dorundo, E., Desai, R., Zhu, T., Parajuli, S., Guo, M.,
Song, D., Steinhardt, J., and Gilmer, J. (2021). The
many faces of robustness: A critical analysis of out-
of-distribution generalization. In ICCV, pages 8320–
8329.
Hendrycks, D., Liu, X., Wallace, E., Dziedzic, A., Krish-
nan, R., and Song, D. (2020). Pretrained transformers
improve out-of-distribution robustness. In ACL, pages
2744–2751.
Huang, X., Kroening, D., Ruan, W., Sharp, J., Sun, Y.,
Thamo, E., Wu, M., and Yi, X. (2020). A survey of
safety and trustworthiness of deep neural networks:
Verification, testing, adversarial attack and defence,
and interpretability. Comput. Sci. Rev., 37:100270.
Huang, X. and Marques-Silva, J. (2023). From ro-
bustness to explainability and back again. CoRR,
abs/2306.03048.
Huang, X. and Marques-Silva, J. (2024). On the failings
of shapley values for explainability. Int. J. Approx.
Reason., page 109112.
Huang, Y., Zhang, H., Shi, Y., Kolter, J. Z., and Anand-
kumar, A. (2021). Training certifiably robust neu-
ral networks with efficient local lipschitz bounds. In
NeurIPS, pages 22745–22757.
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., and
Bengio, Y. (2016). Binarized neural networks. In
NeurIPS, pages 4107–4115.
Ignatiev, A. (2020). Towards trustable explainable AI. In
IJCAI, pages 5154–5158.
Ignatiev, A., Narodytska, N., Asher, N., and Marques-Silva,
J. (2020). From contrastive to abductive explanations
and back again. In AIxIA, pages 335–355.
Ignatiev, A., Narodytska, N., and Marques-Silva, J. (2019).
Abduction-based explanations for machine learning
models. In AAAI, pages 1511–1519.
Izza, Y., Huang, X., Morgado, A., Planes, J., Ignatiev, A.,
and Marques-Silva, J. (2024). Distance-Restricted
Explanations: Theoretical Underpinnings & Efficient
Implementation. In KR, pages 475–486.
Izza, Y., Ignatiev, A., and Marques-Silva, J. (2022). On
tackling explanation redundancy in decision trees. J.
Artif. Intell. Res., 75:261–321.
Izza, Y. and Marques-Silva, J. (2023). The pros and cons of
adversarial robustness. CoRR, abs/2312.10911.
Izza, Y. and Marques-Silva, J. (2024). Efficient contrastive
explanations on demand. CoRR.
Katz, G., Barrett, C. W., Dill, D. L., Julian, K., and Kochen-
derfer, M. J. (2017). Reluplex: An efficient SMT
solver for verifying deep neural networks. In CAV,
pages 97–117.
Katz, G., Huang, D. A., Ibeling, D., Julian, K., Lazarus, C.,
Lim, R., Shah, P., Thakoor, S., Wu, H., Zeljic, A., Dill,
D. L., Kochenderfer, M. J., and Barrett, C. W. (2019).
The marabou framework for verification and analysis
of deep neural networks. In CAV, pages 443–452.
L
´
ecuyer, M., Atlidakis, V., Geambasu, R., Hsu, D., and
Jana, S. (2019). Certified robustness to adversarial ex-
amples with differential privacy. In IEEE S&P, pages
656–672.
Leino, K., Wang, Z., and Fredrikson, M. (2021). Globally-
robust neural networks. In ICML, pages 6212–6222.
Letoffe, O., Huang, X., and Marques-Silva, J. (2024). On
correcting SHAP scores. In AAAI.
Liang, H., He, E., Zhao, Y., Jia, Z., and Li, H. (2022). Ad-
versarial attack and defense: A survey. Electronics,
11(8):1283.
Liu, X., Han, X., Zhang, N., and Liu, Q. (2020). Certified
monotonic neural networks. In NeurIPS.
Marques-Silva, J. (2023). Disproving XAI myths with for-
mal methods - initial results. In ICECCS, pages 12–
21.
Marques-Silva, J. (2024). Logic-based explainability: Past,
present and future. In ISoLA, pages 181–204.
Marques-Silva, J. and Huang, X. (2024). Explainability is
not a game. Commun. ACM, pages 66–75.
Marques-Silva, J. and Ignatiev, A. (2022). Delivering trust-
worthy AI through formal XAI. In AAAI, pages
12342–12350.
Marques-Silva, J. and Ignatiev, A. (2023). No silver bullet:
interpretable ML models must be explained. Frontiers
in Artificial Intelligence, 6:1128212.
Miller, T. (2019). Explanation in artificial intelligence: In-
sights from the social sciences. Artif. Intell., 267:1–
38.
M
¨
uller, M. N., Brix, C., Bak, S., Liu, C., and Johnson, T. T.
(2022). The third international verification of neural
networks competition (VNN-COMP 2022): Summary
and results. CoRR, abs/2212.10376.
Narodytska, N. (2018). Formal analysis of deep binarized
neural networks. In Lang, J., editor, Proceedings of
the Twenty-Seventh International Joint Conference on
Artificial Intelligence, IJCAI 2018, July 13-19, 2018,
Stockholm, Sweden, pages 5692–5696. ijcai.org.
Pedregosa, F. et al. (2011). Scikit-learn: Machine learning
in Python. Journal of Machine Learning Research,
12:2825–2830.
Reiter, R. (1987). A theory of diagnosis from first princi-
ples. Artif. Intell., 32(1):57–95.
Rosenberg, I., Shabtai, A., Elovici, Y., and Rokach, L.
(2022). Adversarial machine learning attacks and de-
fense methods in the cyber security domain. ACM
Comput. Surv., 54(5):108:1–108:36.
Rosenfeld, E., Winston, E., Ravikumar, P., and Kolter, J. Z.
(2020). Certified robustness to label-flipping attacks
via randomized smoothing. In ICML, pages 8230–
8241.
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
396