classes (VOC) challenge. International Journal of
Computer Vision, 88(2):303–338.
Fu, C.-y., Liu, W., Ranga, A., Tyagi, A., and Berg, A. C.
(2017). DSSD : Deconvolutional Single Shot Detec-
tor. arXiv preprint arXiv:1701.06659.
Gidaris, S. and Komodakis, N. (2016). Attend Refine
Repeat : Active Box Proposal. arXiv preprint
arXiv:1606.04446v1.
Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE
international conference on computer vision, pages
1440–1448.
Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014).
Rich feature hierarchies for accurate object detection
and semantic segmentation. Proceedings of the IEEE
Computer Society Conference on Computer Vision
and Pattern Recognition, pages 580–587.
Guo, X., Liu, D., Jou, B., Zhu, M., Cai, A., and Chang, S. F.
(2013). Robust object co-detection. Proceedings of
the IEEE Computer Society Conference on Computer
Vision and Pattern Recognition, pages 3206–3213.
Hadsell, R., Chopra, S., and LeCun, Y. (2006). Dimension-
ality reduction by learning an invariant mapping. Pro-
ceedings of the IEEE Computer Society Conference
on Computer Vision and Pattern Recognition, 2:1735–
1742.
He, K., Fan, H., Wu, Y., Xie, S., and Girshick, R. (2020).
Momentum contrast for unsupervised visual represen-
tation learning. In Proceedings of the IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition,
pages 9729–9738.
He, K., Gkioxari, G., Dollar, P., and Girshick, R. (2017).
Mask R-CNN. Proceedings of the IEEE International
Conference on Computer Vision, 2017-Octob:2980–
2988.
Hermans, A., Beyer, L., and Leibe, B. (2017). In Defense
of the Triplet Loss for Person Re-Identification. arXiv
preprint arXiv:1703.07737.
Huang, Y., Wang, Y., Tai, Y., Liu, X., Shen, P., Li, S., Li, J.,
and Huang, F. (2020). Curricularface: adaptive cur-
riculum learning loss for deep face recognition. In
Proceedings of the IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition, pages 5901–
5910.
Jiang, S., Liang, S., Chen, C., Zhu, Y., and Li, X. (2019).
Class Agnostic Image Common Object Detection.
IEEE Transactions on Image Processing, 28(6):2836–
2846.
Joulin, A., Bach, F., and Ponce, J. (2010). Discriminative
clustering for image co-segmentation. Proceedings of
the IEEE Computer Society Conference on Computer
Vision and Pattern Recognition, pages 1943–1950.
Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y.,
Isola, P., Maschinot, A., Liu, C., and Krishnan, D.
(2020). Supervised Contrastive Learning. arXiv
preprint arXiv:2004.11362, pages 1–18.
Law, H. and Deng, J. (2018). Cornernet: Detecting objects
as paired keypoints. Lecture Notes in Computer Sci-
ence, 11218 LNCS:765–781.
Le, H., Yu, C. P., Zelinsky, G., and Samaras, D. (2017). Co-
localization with Category-Consistent Features and
Geodesic Distance Propagation. Proceedings - 2017
IEEE International Conference on Computer Vision
Workshops, ICCVW 2017, 2018-Janua:1103–1112.
Li, W., Hosseini Jafari, O., and Rother, C. (2019a). Deep
Object Co-segmentation. In Lecture Notes in Com-
puter Science, volume 11363 LNCS, pages 638–653.
Li, W., Jafari, H., and Rother, C. (2019b). Localizing Com-
mon Objects Using Common Component Activation
Map. pages 28–31.
Lin, T.-Y., Doll
´
ar, P., Girshick, R., He, K., Hariharan, B.,
and Belongie, S. (2017a). Feature pyramid networks
for object detection. In Proceedings of the IEEE con-
ference on computer vision and pattern recognition,
pages 2117–2125.
Lin, T. Y., Goyal, P., Girshick, R., He, K., and Dollar,
P. (2017b). Focal Loss for Dense Object Detection.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 42(2):318–327.
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ra-
manan, D., Doll
´
ar, P., and Zitnick, C. L. (2014). Mi-
crosoft COCO: Common objects in context. In Lec-
ture Notes in Computer Science, volume 8693 LNCS,
pages 740–755.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.,
Fu, C. Y., and Berg, A. C. (2016). SSD: Single shot
multibox detector. Lecture Notes in Computer Sci-
ence, 9905 LNCS:21–37.
Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., and Song, L.
(2017). SphereFace: Deep hypersphere embedding
for face recognition. Proceedings - 30th IEEE Con-
ference on Computer Vision and Pattern Recognition,
CVPR 2017, 2017-Janua:6738–6746.
Merdassi, H., Barhoumi, W., and Zagrouba, E. (2019). A
Comprehensive Overview of Relevant Methods of Im-
age Cosegmentation. Expert Systems with Applica-
tions, 140:112901.
Qiao, S., Wang, H., Liu, C., Shen, W., and Yuille, A.
(2019). Weight Standardization. arXiv preprint
arXiv:1903.10520.
Quan, R., Han, J., Zhang, D., and Nie, F. (2016). Object
Co-segmentation via Graph Optimized-Flexible Man-
ifold Ranking. In 2016 IEEE Conference on Com-
puter Vision and Pattern Recognition (CVPR), pages
687–695.
Redmon, J. and Farhadi, A. (2017). YOLO9000: Better,
faster, stronger. Proceedings - 30th IEEE Conference
on Computer Vision and Pattern Recognition, CVPR
2017, 2017-Janua:6517–6525.
Redmon, J. and Farhadi, A. (2018). YOLOv3:
An Incremental Improvement. arXiv preprint
arXiv:1804.02767.
Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster
r-cnn: Towards real-time object detection with region
proposal networks. In Advances in neural information
processing systems, pages 91–99.
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid,
I., and Savarese, S. (2019). Generalized intersection
over union: A metric and a loss for bounding box re-
gression. In Proceedings of the IEEE Conference on
ICPRAM 2021 - 10th International Conference on Pattern Recognition Applications and Methods
406