
REFERENCES
Andrew, G., Arora, R., Bilmes, J., and Livescu, K. (2013).
Deep canonical correlation analysis. In Dasgupta,
S. and McAllester, D., editors, Proceedings of the
30th International Conference on Machine Learning,
volume 28 of Proceedings of Machine Learning Re-
search, pages 1247–1255, Atlanta, Georgia, USA.
PMLR.
Balestriero, R., Ibrahim, M., Sobal, V., Morcos, A.,
Shekhar, S., Goldstein, T., Bordes, F., Bardes, A., Mi-
alon, G., Tian, Y., Schwarzschild, A., Wilson, A. G.,
Geiping, J., Garrido, Q., Fernandez, P., Bar, A., Pir-
siavash, H., LeCun, Y., and Goldblum, M. (2023). A
cookbook of self-supervised learning.
Bardes, A., Ponce, J., and Vicreg, Y. L. (2021). Vi-
creg: Variance-invariance-covariance regularization
for self-supervised learning. In International Confer-
ence on Learning Representations.
Bengio, Y., Lamblin, P., Popovici, D., and Larochelle, H.
(2006). Greedy layer-wise training of deep networks.
In Sch
¨
olkopf, B., Platt, J., and Hoffman, T., editors,
Advances in Neural Information Processing Systems,
volume 19. MIT Press.
Bordes, F., Balestriero, R., and Vincent, P. (2023). Towards
democratizing joint-embedding self-supervised learn-
ing.
Bossard, L., Guillaumin, M., and Van Gool, L. (2014).
Food-101 – mining discriminative components with
random forests. In European Conference on Computer
Vision.
Cai, T. T., Frankle, J., Schwab, D. J., and Morcos, A. S.
(2020). Are all negatives created equal in contrastive
instance discrimination?
Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P.,
and Joulin, A. (2020). Unsupervised learning of vi-
sual features by contrasting cluster assignments. In
Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.,
and Lin, H., editors, Advances in Neural Information
Processing Systems, volume 33, pages 9912–9924.
Caron, M., Touvron, H., Misra, I., J
´
egou, H., Mairal, J., Bo-
janowski, P., and Joulin, A. (2021). Emerging proper-
ties in self-supervised vision transformers.
Chen, T., Kornblith, S., Norouzi, M., and Hinton, G.
(2020a). A simple framework for contrastive learning
of visual representations.
Chen, X., Fan, H., Girshick, R., and He, K. (2020b). Im-
proved baselines with momentum contrastive learn-
ing.
Chen, X. and He, K. (2021). Exploring simple siamese rep-
resentation learning. In IEEE Conference on Com-
puter Vision and Pattern Recognition, CVPR 2021,
virtual, June 19-25, 2021, pages 15750–15758. Com-
puter Vision Foundation / IEEE.
Chen, X., Xie, S., and He, K. (2021). An empirical study of
training self-supervised vision transformers. In 2021
IEEE/CVF International Conference on Computer Vi-
sion (ICCV), pages 9620–9629, Los Alamitos, CA,
USA. IEEE Computer Society.
Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., , and
Vedaldi, A. (2014). Describing textures in the wild.
In Proceedings of the IEEE Conf. on Computer Vision
and Pattern Recognition (CVPR).
Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., and Le,
Q. V. (2019a). Autoaugment: Learning augmentation
policies from data. In IEEE Conference on Computer
Vision and Pattern Recognition, CVPR 2019.
Cubuk, E. D., Zoph, B., Shlens, J., and Le, Q. V. (2019b).
Randaugment: Practical automated data augmentation
with a reduced search space.
Deng, J., Socher, R., Fei-Fei, L., Dong, W., Li, K., and
Li, L.-J. (2009). Imagenet: A large-scale hierarchical
image database. In 2009 IEEE Conference on Com-
puter Vision and Pattern Recognition(CVPR), pages
248–255.
Doersch, C., Gupta, A., and Efros, A. A. (2015). Unsuper-
vised visual representation learning by context predic-
tion. In International Conference on Computer Vision
(ICCV).
Donahue, J., Kr
¨
ahenb
¨
uhl, P., and Darrell, T. (2017). Ad-
versarial feature learning. In 5th International Con-
ference on Learning Representations, ICLR 2017,
Toulon, France, April 24-26, 2017, Conference Track
Proceedings. OpenReview.net.
Dosovitskiy, A., Springenberg, J. T., Riedmiller, M., and
Brox, T. (2014). Discriminative unsupervised fea-
ture learning with convolutional neural networks. In
Ghahramani, Z., Welling, M., Cortes, C., Lawrence,
N., and Weinberger, K., editors, Advances in Neural
Information Processing Systems, volume 27. Curran
Associates, Inc.
Everingham, M., Van Gool, L., Williams, C. K. I., Winn,
J., and Zisserman, A. (2010). The pascal visual ob-
ject classes (voc) challenge. International Journal of
Computer Vision, 88(2):303–338.
Gidaris, S., Singh, P., and Komodakis, N. (2018). Unsu-
pervised representation learning by predicting image
rotations. CoRR, abs/1803.07728.
Grill, J.-B., Strub, F., Altch
´
e, F., Tallec, C., Richemond,
P. H., Buchatskaya, E., Doersch, C., Pires, B. A., Guo,
Z. D., Azar, M. G., Piot, B., Kavukcuoglu, K., Munos,
R., and Valko, M. (2020). Bootstrap your own latent:
A new approach to self-supervised learning.
He, K., Chen, X., Xie, S., Li, Y., Dollar, P., and Gir-
shick, R. (2022). Masked autoencoders are scalable
vision learners. In Proceedings - 2022 IEEE/CVF
Conference on Computer Vision and Pattern Recog-
nition, CVPR 2022, Proceedings of the IEEE Com-
puter Society Conference on Computer Vision and
Pattern Recognition, pages 15979–15988. IEEE Com-
puter Society. Publisher Copyright: © 2022 IEEE.;
2022 IEEE/CVF Conference on Computer Vision and
Pattern Recognition, CVPR 2022 ; Conference date:
19-06-2022 Through 24-06-2022.
He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R. (2020).
Momentum contrast for unsupervised visual represen-
tation learning.
Krizhevsky, A. (2009). Learning multiple layers of features
from tiny images.
VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications
408