To show the usefulness of CrossSiam, we com-
pared it with ParaSiam, a network that simply applied
k-fold cross validation to representation learning. We
conducted two experiments for evaluation. The first
involved linear evaluation, which measures the classi-
fication accuracy when embedding space is fixed and
only 1 fully connected layer is used for supervised
learning. The experimental results showed that the
proposed method achieved higher accuracy than the
baseline for 2-fold and 5-fold cross validation. In par-
ticular, the accuracy of 2-fold CrossSiam was much
higher than that of 2-fold ParaSiam. The second in-
volved Fr
´
echet distance (FD), the distribution differ-
ence between each dataset like training, validation
and test. The experimental results show that distance
between the embeddings of training data and vali-
dation data is smaller and distance between the em-
beddings of validation data and test data is larger for
CrossSiam than for ParaSiam. This means that, unde-
sirably, CrossSiam has the requirement of leaking ver-
ification data to the network compared to ParaSiam.
However, it is unclear whether leakage actually oc-
curs. In the future, we will conduct experiments to see
if each validation data can be used to suppress overfit-
ting. We also show that CrossSiam can be trained on
datasets with a high percentage of out-of-distribution.
We will experiment to show the suitability of this ap-
proach for autonomous control of multiple drones.
REFERENCES
Chen, T., Kornblith, S., Norouzi, M., and Hinton, G. (2020).
A simple framework for contrastive learning of visual
representations. In III, H. D. and Singh, A., editors,
Proceedings of the 37th International Conference on
Machine Learning, volume 119 of Proceedings of Ma-
chine Learning Research, pages 1597–1607. PMLR.
Chen, X., Yuan, Y., Zeng, G., and Wang, J. (2021). Semi-
supervised semantic segmentation with cross pseudo
supervision. In IEEE Conference on Computer Vision
and Pattern Recognition (CVPR).
Deng, W. and Zheng, L. (2021). Are labels always nec-
essary for classifier accuracy evaluation? In Proc.
CVPR.
Dowson, D. and Landau, B. (1982). The fr
´
echet distance
between multivariate normal distributions. Journal of
Multivariate Analysis, 12(3):450–455.
Grill, J.-B., Strub, F., Altch
´
e, F., Tallec, C., Richemond, P.,
Buchatskaya, E., Doersch, C., Avila Pires, B., Guo,
Z., Gheshlaghi Azar, M., Piot, B., kavukcuoglu, k.,
Munos, R., and Valko, M. (2020). Bootstrap your own
latent - a new approach to self-supervised learning.
In Larochelle, H., Ranzato, M., Hadsell, R., Balcan,
M. F., and Lin, H., editors, Advances in Neural Infor-
mation Processing Systems, volume 33, pages 21271–
21284. Curran Associates, Inc.
He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R. (2020).
Momentum contrast for unsupervised visual represen-
tation learning. In Proceedings of the IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition
(CVPR).
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. 2016 IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 770–778.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012).
Imagenet classification with deep convolutional neu-
ral networks. In Pereira, F., Burges, C. J. C., Bottou,
L., and Weinberger, K. Q., editors, Advances in Neu-
ral Information Processing Systems, volume 25, pages
1097–1105. Curran Associates, Inc.
Loshchilov, I. and Hutter, F. (2017). SGDR: stochastic gra-
dient descent with warm restarts. In 5th International
Conference on Learning Representations, ICLR 2017,
Toulon, France, April 24-26, 2017, Conference Track
Proceedings. OpenReview.net.
Shamir, O. and Zhang, T. (2013). Stochastic gradient de-
scent for non-smooth optimization: Convergence re-
sults and optimal averaging schemes. In Dasgupta,
S. and McAllester, D., editors, Proceedings of the
30th International Conference on Machine Learning,
volume 28 of Proceedings of Machine Learning Re-
search, pages 71–79, Atlanta, Georgia, USA. PMLR.
Simonyan, K. and Zisserman, A. (2014). Very deep con-
volutional networks for large-scale image recognition.
CoRR, abs/1409.1556.
Suzuki, K., Matsuzawa, T., Takimoto, M., and Kambayashi,
Y. (2021). Vector quantization to visualize the detec-
tion process. In Rocha, A., Steels, L., and van den
Herik, J., editors, ICAART 2021 - Proceedings of the
13th International Conference on Agents and Artifi-
cial Intelligence, pages 553–561. SciTePress.
Tan, M. and Le, Q. (2019). EfficientNet: Rethinking model
scaling for convolutional neural networks. volume 97
of Proceedings of Machine Learning Research, pages
6105–6114, Long Beach, California, USA. PMLR.
APPENDIX
Architecture
In this study, we use f
θ
as ResNet-18 Encoder,
ResNet-18 without the last full connection layer (Ta-
ble 4), projector g
θ
constructed as in Table 6, and pre-
dictor h
θ
constructed as in Table 5 like in BYOL (Grill
et al., 2020).
Training Details for Representation
Learning
We used the SGD optimizer (Shamir and Zhang,
2013) to train model parameters θ in representation
SDMIS 2022 - Special Session on Super Distributed and Multi-agent Intelligent Systems
546