Convergence Rate. Lastly, we show the loss
function of the cross entropy (CE) loss and sample
cross entropy (CE+CL2) on ResNet50 network and
CIFAR-100 dataset to evaluate their convergence
rate. For CE + CL2, we only extract the CE
component to be plotted. Figure 4 shows the two
plots. Obviously, when the training is regulated by
sample contrastive loss, the cross entropy loss
converges faster compared to without regularization.
However, the network then converges to roughly the
same level after epoch 35. The same pattern is
observed for all other experiments.
Figure 4: The cross entropy loss for CE and CE+CL2 on
ResNet50 and CIFAR-100 dataset. For CE+CL2, only the
cross entropy loss component is used to plot the graph. With
CL2 regularization, the cross entropy converges faster.
6 CONCLUSION
Deep networks have shown impressive performance
on a number of computer vision tasks. However,
deeper networks are more susceptible to overfitting
especially when the number of samples per class are
small. In this work we introduced batch contrastive
loss to regularize the network by comparing samples
in a batch loss. Our experiments show that batch
contrastive loss has good generalization performance
especially on deeper network and dataset with smaller
number of samples per class. It also further reveals
potential issue with the positive loss for general
classification tasks which is a subject for future
investigation. In the future, we plan to perform more
evaluation to demonstrate that the technique
generalize well to other datasets as well as tasks (e.g.,
video action classification). We will also look into the
efficiency issues of contrastive loss.
ACKNOWLEDGMENT
This work was supported by a FRGS grant
(FRGS/1/2018/ICT02/UTAR/02/03) from the
Ministry of Higher Education (MOHE) of Malaysia.
REFERENCES
Chopra, S., Hadsell, R., & LeCun, Y. (2005, June).
Learning a similarity metric discriminatively, with
application to face verification. In 2005 IEEE
Computer Society Conference on Computer Vision and
Pattern Recognition (CVPR'05) (Vol. 1, pp. 539-546).
IEEE.
Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., & Le, Q.
V. (2019). Autoaugment: Learning augmentation
strategies from data. In Proceedings of the IEEE
conference on computer vision and pattern
recognition (pp. 113-123).
DeVries, T., & Taylor, G. W. (2017). Dataset augmentation
in feature space. arXiv preprint arXiv:1702.05538.
DeVries, T., & Taylor, G. W. (2017). Improved
regularization of convolutional neural networks with
cutout. arXiv preprint arXiv:1708.04552.
Gastaldi, X. (2017). Shake-shake regularization of 3-branch
residual network. In 5th International Conference on
Learning Representations
Ghiasi, G., Lin, T. Y., & Le, Q. V. (2018). Dropblock: A
regularization method for convolutional networks.
In Advances in Neural Information Processing
Systems (pp. 10727-10737).
Han, D., Kim, J., & Kim, J. (2017). Deep pyramidal residual
networks. In Proceedings of the IEEE conference on
computer vision and pattern recognition (pp. 5927-
5935).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual
learning for image recognition. In Proceedings of the
IEEE conference on computer vision and pattern
recognition (pp. 770-778).
Hermans, A., Beyer, L., & Leibe, B. (2017). In defense of
the triplet loss for person re-identification. arXiv
preprint arXiv:1703.07737.
Horiguchi, S., Ikami, D., & Aizawa, K. (2019).
Significance of softmax-based features in comparison
to distance metric learning-based features. IEEE
transactions on pattern analysis and machine
intelligence, 42(5), 1279-1285.
Hou, S., & Wang, Z (2019, July). Weighted channel
dropout for regularization of deep convolutional neural
network. In Proceedings of the AAAI Conference on
Artificial Intelligence (Vol. 33, pp. 8425-8432).
Huang, G., Sun, Y., Liu, Z., Sedra, D., & Weinberger, K.
Q. (2016, October). Deep networks with stochastic
depth. In European conference on computer vision (pp.
646-661). Springer, Cham.
Huang, Y., Cao, X., Zhang, B., Zheng, J., & Kong, X.
(2017, April). Batch loss regularization in deep learning
method for aerial scene classification. In 2017
Integrated Communications, Navigation and
Surveillance Conference (ICNS) (pp. 3E2-1). IEEE.
Indyk, P., & Motwani, R. (1998, May). Approximate
nearest neighbors: towards removing the curse of
dimensionality. In Proceedings of the thirtieth annual
ACM symposium on Theory of computing (pp. 604-
613).