Choe, J. and Shim, H. (2019). Attention-based dropout
layer for weakly supervised object localization. In
Proceedings of the IEEE Conference on Computer Vi-
sion and Pattern Recognition (CVPR), pages 2219–
2228.
Christlein, V., Spranger, L., Seuret, M., Nicolaou, A., Kr
´
al,
P., and Maier, A. (2019). Deep generalized max
pooling. In 2019 International Conference on Docu-
ment Analysis and Recognition (ICDAR), pages 1090–
1096.
DeVries, T. and Taylor, G. W. (2017). Improved regular-
ization of convolutional neural networks with cutout.
arXiv preprint arXiv:1708.04552.
Fu, R., Hu, Q., Dong, X., Guo, Y., Gao, Y., and Li, B.
(2020). Axiom-based grad-cam: Towards accurate vi-
sualization and explanation of cnns. arXiv preprint
arXiv:2008.02312.
Gao, Y., Beijbom, O., Zhang, N., and Darrell, T. (2016).
Compact bilinear pooling. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recogni-
tion (CVPR), pages 317–326.
He, K., Gkioxari, G., Doll
´
ar, P., and Girshick, R. (2017).
Mask r-cnn. In Proceedings of the IEEE International
Conference on Computer Vision (ICCV), pages 2961–
2969.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), pages 770–778.
Ki, M., Uh, Y., Lee, W., and Byun, H. (2020). In-
sample contrastive learning and consistent attention
for weakly supervised object localization. In Proceed-
ings of the Asian Conference on Computer Vision.
Kim, J., Choe, J., Yun, S., and Kwak, N. (2021). Normaliza-
tion matters in weakly supervised object localization.
In Proceedings of the IEEE/CVF International Con-
ference on Computer Vision, pages 3427–3436.
Kirillov, A., Girshick, R., He, K., and Doll
´
ar, P. (2019).
Panoptic feature pyramid networks. In Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), pages 6399–6408.
Lin, T.-Y., Doll
´
ar, P., Girshick, R., He, K., Hariharan, B.,
and Belongie, S. (2017a). Feature pyramid networks
for object detection. In Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 2117–2125.
Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll
´
ar, P.
(2017b). Focal loss for dense object detection. In
Proceedings of the IEEE International Conference on
Computer Vision (ICCV), pages 2980–2988.
Muhammad, M. B. and Yeasin, M. (2020). Eigen-cam:
Class activation map using principal components. In
2020 International Joint Conference on Neural Net-
works (IJCNN), pages 1–7.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J.,
Chanan, G., Killeen, T., Lin, Z., Gimelshein, N.,
Antiga, L., Desmaison, A., Kopf, A., Yang, E., De-
Vito, Z., Raison, M., Tejani, A., Chilamkurthy, S.,
Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019).
Pytorch: An imperative style, high-performance deep
learning library. In Wallach, H., Larochelle, H.,
Beygelzimer, A., d'Alch
´
e-Buc, F., Fox, E., and Gar-
nett, R., editors, Advances in Neural Information Pro-
cessing Systems 32, pages 8024–8035. Curran Asso-
ciates, Inc.
Pinheiro, P. O. and Collobert, R. (2015). From image-level
to pixel-level labeling with convolutional networks. In
Proceedings of the IEEE Conference on Computer Vi-
sion and Pattern Recognition (CVPR), pages 1713–
1721.
Ramaswamy, H. G. et al. (2020). Ablation-cam: Vi-
sual explanations for deep convolutional network via
gradient-free localization. In Proceedings of the IEEE
Winter Conference on Applications of Computer Vi-
sion (WACV), pages 983–991.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S.,
Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bern-
stein, M., et al. (2015). Imagenet large scale visual
recognition challenge. International Journal of Com-
puter Vision, 115(3):211–252.
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R.,
Parikh, D., and Batra, D. (2017). Grad-cam: Visual
explanations from deep networks via gradient-based
localization. In Proceedings of the IEEE International
Conference on Computer Vision (ICCV), pages 618–
626.
Simon, M., Gao, Y., Darrell, T., Denzler, J., and Rodner, E.
(2017). Generalized orderless pooling performs im-
plicit salient matching. In Proceedings of the IEEE
International Conference on Computer Vision (ICCV),
pages 4960–4969.
Singh, K. K. and Lee, Y. J. (2017). Hide-and-seek: Forc-
ing a network to be meticulous for weakly-supervised
object and action localization. In Proceedings of the
IEEE International Conference on Computer Vision
(ICCV), pages 3544–3553.
Wah, C., Branson, S., Welinder, P., Perona, P., and Be-
longie, S. (2011). The caltech-ucsd birds-200-2011
dataset.
Wang, H., Wang, Z., Du, M., Yang, F., Zhang, Z., Ding, S.,
Mardziel, P., and Hu, X. (2020). Score-cam: Score-
weighted visual explanations for convolutional neural
networks. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition Workshops
(CVPR-WS), pages 24–25.
Yun, S., Han, D., Oh, S. J., Chun, S., Choe, J., and Yoo,
Y. (2019). Cutmix: Regularization strategy to train
strong classifiers with localizable features. In Pro-
ceedings of the IEEE International Conference on
Computer Vision (ICCV), pages 6023–6032.
Zhang, B., Zhao, Q., Feng, W., and Lyu, S. (2018a). Al-
phamex: A smarter global pooling method for convo-
lutional neural networks. Neurocomputing, 321:36–
48.
Zhang, X., Wei, Y., Feng, J., Yang, Y., and Huang, T. S.
(2018b). Adversarial complementary learning for
weakly supervised object localization. In Proceedings
of the IEEE Conference on Computer Vision and Pat-
tern Recognition (CVPR), pages 1325–1334.
Zhang, X., Wei, Y., Kang, G., Yang, Y., and Huang,
VISAPP 2022 - 17th International Conference on Computer Vision Theory and Applications
188