the model accuracy. As can be seen, when 10% of top
most important pixels are removed, there is a sharp
drop in the mean average precision to reach around
37%. This continues to drop when we remove pixels
in decreasing order of salience (as expected).
6 CONCLUSIONS
In this work, we applied explainability method for ob-
ject detection analysis and comprehension. An im-
portant number of masks is generated to mask in-
put image. Afterwards, the model is running on the
obtained masked images to get proposals. Finally,
the saliency map is computed as a weighted sum of
the masks where weights are pairwise similarities be-
tween proposals and target detections. We have suc-
cessfully demonstrated the application of the DRISE
explainability method to our face detector models as
a preliminary exploration in Tenebrism painting im-
ages. We have also shown its abilities to locate rele-
vant features and discuss reasons for its failures. As
further improvement of this work, we should be look-
ing to improve the very processing time consuming of
the method.
REFERENCES
Ancona, M., Ceolini, E.,
¨
Oztireli, C., and Gross, M. (2019).
Gradient-based attribution methods. In Explainable
AI: Interpreting, Explaining and Visualizing Deep
Learning, pages 169–191. Springer.
Brunke, L., Agrawal, P., and George, N. (2020). Evaluating
input perturbation methods for interpreting cnns and
saliency map comparison. In ECCV Workshops.
Cetinic, E., Lipic, T., and Grgic, S. (2019). A deep learning
perspective on beauty, sentiment, and remembrance of
art. IEEE Access, 7:73694–73710.
Chattopadhay, A., Sarkar, A., Howlader, P., and Balasub-
ramanian, V. N. (2018). Grad-cam++: Generalized
gradient-based visual explanations for deep convolu-
tional networks. In 2018 IEEE winter conference on
applications of computer vision (WACV), pages 839–
847. IEEE.
Gamra, S. B., Mzoughi, O., Bigand, A., and Zagrouba, E.
(2021). New challenges of face detection in paintings
based on deep learning. In VISIGRAPP.
Hogan, M., Aouf, N., Spencer, P., and Almond, J. (2022).
Explainable object detection for uncrewed aerial ve-
hicles using kernelshap. In 2022 IEEE International
Conference on Autonomous Robot Systems and Com-
petitions (ICARSC), pages 136–141. IEEE.
Ivanovs, M., Kadikis, R., and Ozols, K. (2021).
Perturbation-based methods for explaining deep neu-
ral networks: A survey. Pattern Recognition Letters,
150:228–234.
Kim, V., Cho, H., and Chung, S. (2021). One-step pixel-
level perturbation-based saliency detector. In BMVC
virtual conference.
Mzoughi, O., Bigand, A., and Renaud, C. (2018). Face
detection in painting using deep convolutional neural
networks. In ACIVS.
Nielsen, I. E., Dera, D., Rasool, G., Ramachandran, R. P.,
and Bouaynaya, N. C. (2022). Robust explainability:
A tutorial on gradient-based attribution methods for
deep neural networks. IEEE Signal Processing Mag-
azine, 39(4):73–84.
Omeiza, D., Speakman, S., Cintas, C., and Weldermariam,
K. (2019). Smooth grad-cam++: An enhanced in-
ference level visualization technique for deep con-
volutional neural network models. arXiv preprint
arXiv:1908.01224.
Padmanabhan, D. C. (2022). Dext: Detector explana-
tion toolkit for explaining multiple detections using
saliency methods.
Petsiuk, V., Das, A., and Saenko, K. (2018). Rise: Ran-
domized input sampling for explanation of black-box
models. arXiv preprint arXiv:1806.07421.
Petsiuk, V., Jain, R., Manjunatha, V., Morariu, V. I., Mehra,
A., Ordonez, V., and Saenko, K. (2021). Black-box
explanation of object detectors via saliency maps. In
Proceedings of the IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition, pages 11443–
11452.
Pinciroli Vago, N. O., Milani, F., Fraternali, P., and
da Silva Torres, R. (2021). Comparing cam algo-
rithms for the identification of salient image features
in iconography artwork analysis. Journal of Imaging,
7(7):106.
Qiu, L., Yang, Y., Cao, C. C., Liu, J., Zheng, Y., Ngai, H.
H. T., Hsiao, J., and Chen, L. (2021). Resisting out-of-
distribution data problem in perturbation of xai. arXiv
preprint arXiv:2107.14000.
Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). ” why
should i trust you?” explaining the predictions of any
classifier. In Proceedings of the 22nd ACM SIGKDD
international conference on knowledge discovery and
data mining, pages 1135–1144.
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R.,
Parikh, D., and Batra, D. (2017). Grad-cam: Visual
explanations from deep networks via gradient-based
localization. In Proceedings of the IEEE international
conference on computer vision, pages 618–626.
Simonyan, K., Vedaldi, A., and Zisserman, A. (2014).
Deep inside convolutional networks: Visualising im-
age classification models and saliency maps. In In
Workshop at International Conference on Learning
Representations. Citeseer.
Smilkov, D., Thorat, N., Kim, B., Vi
´
egas, F., and Watten-
berg, M. (2017). Smoothgrad: removing noise by
adding noise. arXiv preprint arXiv:1706.03825.
Springenberg, J. T., Dosovitskiy, A., Brox, T., and Ried-
miller, M. (2014). Striving for simplicity: The all con-
volutional net. arXiv preprint arXiv:1412.6806.
Surapaneni, S., Syed, S., and Lee, L. Y. (2020). Exploring
themes and bias in art using machine learning image
ICPRAM 2023 - 12th International Conference on Pattern Recognition Applications and Methods
840