ternal covariate shift. volume 37 of Proceedings of
Machine Learning Research, pages 448–456, Lille,
France. PMLR.
Kingma, D. P. and Welling, M. (2014). Auto-encoding
variational bayes. In 2nd International Conference
on Learning Representations, ICLR 2014, Banff, AB,
Canada, April 14-16, 2014, Conference Track Pro-
ceedings, volume abs/1312.6114.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012).
Imagenet classification with deep convolutional neu-
ral networks. In Pereira, F., Burges, C. J. C., Bottou,
L., and Weinberger, K. Q., editors, Advances in Neu-
ral Information Processing Systems, volume 25, pages
1097–1105. Curran Associates, Inc.
Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J.,
and Han, J. (2020). On the variance of the adap-
tive learning rate and beyond. In International Con-
ference on Learning Representations, ICLR 2020,
https://openreview.net/forum?id=rkgz2aEKDr.
Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). ”why
should I trust you?”: Explaining the predictions of any
classifier. In Proceedings of the 22nd ACM SIGKDD
International Conference on Knowledge Discovery
and Data Mining, San Francisco, CA, USA, August
13-17, 2016, pages 1135–1144.
Schapire, R. E. (1999). A brief introduction to boosting.
In Proceedings of the 16th International Joint Confer-
ence on Artificial Intelligence - Volume 2, IJCAI’99,
pages 1401–1406, San Francisco, CA, USA. Morgan
Kaufmann Publishers Inc.
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R.,
Parikh, D., and Batra, D. (2017). Grad-cam: Visual
explanations from deep networks via gradient-based
localization. In Proceedings of the IEEE International
Conference on Computer Vision (ICCV), pages 618–
626.
Simonyan, K. and Zisserman, A. (2014). Very deep con-
volutional networks for large-scale image recognition.
CoRR, abs/1409.1556.
Smilkov, D., Thorat, N., Kim, B., Vi
´
egas, F., and Watten-
berg, M. (2017). Smoothgrad: removing noise by
adding noise. ArXiv, abs/1706.03825.
Taga, S., Tomofumi, M., Takimoto, M., and Kambayashi, Y.
(2019). Multi-agent base evacuation support system
using manet. Vietnam Journal of Computer Science,
06(02):177–191.
Tan, M. and Le, Q. (2019). EfficientNet: Rethinking model
scaling for convolutional neural networks. volume 97
of Proceedings of Machine Learning Research, pages
6105–6114, Long Beach, California, USA. PMLR.
Van Amersfoort, J., Smith, L., Teh, Y. W., and Gal, Y.
(2020). Uncertainty estimation using a single deep de-
terministic neural network. In III, H. D. and Singh, A.,
editors, Proceedings of the 37th International Con-
ference on Machine Learning, volume 119 of Pro-
ceedings of Machine Learning Research, pages 9690–
9700, Virtual. PMLR.
van den Oord, A., Vinyals, O., and kavukcuoglu, k. (2017).
Neural discrete representation learning. In Guyon,
I., Luxburg, U. V., Bengio, S., Wallach, H., Fer-
gus, R., Vishwanathan, S., and Garnett, R., editors,
Advances in Neural Information Processing Systems,
volume 30, pages 6306–6315. Curran Associates, Inc.
YM., A., C., R., and A., V. (2020). Self-labelling via simul-
taneous clustering and representation learning. In In-
ternational Conference on Learning Representations,
ICLR 2020, https://openreview.net/forum?id=Hyx-
jyBFPr.
Zagoruyko, S. and Komodakis, N. (2016). Wide residual
networks. In Richard C. Wilson, E. R. H. and Smith,
W. A. P., editors, Proceedings of the British Ma-
chine Vision Conference (BMVC), pages 87.1–87.12.
BMVA Press.
Zeiler, M. D. and Fergus, R. (2014). Visualizing and under-
standing convolutional networks. In Fleet, D., Pajdla,
T., Schiele, B., and Tuytelaars, T., editors, Computer
Vision – ECCV 2014, pages 818–833, Cham. Springer
International Publishing.
Zhang, M., Lucas, J., Ba, J., and Hinton, G. E. (2019).
Lookahead optimizer: k steps forward, 1 step back.
In Wallach, H., Larochelle, H., Beygelzimer, A.,
d'Alch
´
e-Buc, F., Fox, E., and Garnett, R., editors,
Advances in Neural Information Processing Systems,
volume 32, pages 9597–9608. Curran Associates, Inc.
Zhao, H., Jia, J., and Koltun, V. (2020). Exploring self-
attention for image recognition. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition (CVPR), pages 10076–10085.
APPENDIX
Architecture. Our classification model architecture
is based on the stacking of residual blocks like Wide
Resnet (Zagoruyko and Komodakis, 2016), and the
application of VQ and batch normalization (Ioffe and
Szegedy, 2015) like Table 3. We set N = 28, k = 2,
and made the embedding space contain 256 embed-
dings.
Table 3: Classifier Architecture.
Group name Output size Block type
conv1 32 × 32 [3 × 3, 16]
conv2 32 × 32
3 × 3, 16 × k
3 × 3, 16 × k
× N
conv3 16 × 16
3 × 3, 32 × k
3 × 3, 32 × k
× N
conv4 8 × 8
3 × 3, 64 × k
3 × 3, 64 × k
× N
bn 8 × 8 [1 × 1]
vq 8 × 8 [1 × 1]
avg-pool 1 × 1 [8 × 8]
Moreover, we have designed a decoder to reproduce
the original image from the embedding of the classi-
fier, symmetrical with the Classifier (Table 4).
SDMIS 2021 - Special Session on Super Distributed and Multi-agent Intelligent Systems
560