PASCALVOC2012 dataset. As shown in Table 5,
when we did not introduce additional connections, the
accuracy was 80.36%. The gain of cross cooperative
connection at Connection1 is 0.16%. When we add
cross connection at Connection2 to our method, it
boosted 1.15% in comparison with the proposed
method without additional connections. Especially,
ASPP improved the feature extraction ability by
performing some dilated convolutions and pooling,
and it can obtain beneficial feature maps. Therefore,
cross connection at Connection1 brings good effect to
ASPP and cross connection at Connection2 brings
good effect in decoding the extracted information.
These results demonstrated the effectiveness of the
additional cross cooperative connection.
5 CONCLUSION
In this paper, we proposed new cooperative learning
method by fusing the features of different backbone
networks for semantic segmentation. Especially, we
used cross cooperative learning with two different
backbones, and our method improved the
conventional cooperative learning. We confirmed
that our method improved the segmentation accuracy
on the PASCAL VOC2012 dataset and the Cityscapes
dataset.
The proposed cross cooperative network used
much calculation resource because our method needs
multiple backbone networks. Therefore, we would
like to realize the cross cooperative learning with
lower computational cost and high accuracy. This is
a subject for future works.
ACKNOWLEDGEMENTS
This paper is partially supported by JSPS KAKENHI
18K11382.
REFERENCES
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet
classification with deep con-volutional neural
networks. In:Advances in Neural Information
Processing Sys-tems. 1097–1105 (2012)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.,
Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich,
A.: Going deeper with convolutions. In: Proceedings of
the IEEE conference on Computer Vision and Pattern
Recognition. pp. 1–9 (2015)
Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H.,
Wang, X., Tang, X.: Residual attention network for
image classification. In: Proceedings of the IEEE
Conference on Computer Vision and Pattern
Recognition. pp. 3156–3164 (2017)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only
look once:unified, real-time object detection. In:
Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition. pp. 779–788 (2016)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.,
Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox
detector. In: Proceedings of the European Conference
on Computer Vision. pp. 21–37. Springer (2016)
Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.:
Openpose: realtime multi-person 2d pose estimation
using part affinity fields. arXiv preprint
arXiv:1812.08008 (2018)
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-
person 2d pose estimation using part affinity fields. In:
Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition. pp. 7291–7299 (2017)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image
translation with conditional adversarial networks. In:
Proceedings of the IEEE conference on Computer
Vision and Pattern Recognition. pp. 1125–1134 (2017)
Ronneberger, O., Fischer, P., Brox, T.: U-net:
Convolutional networks for biomedical image
segmentation. In: International Conference on Medical
Image Computing and Computer-Assisted
Intervention. pp. 234–241. Springer (2015)
Chen, L.C., Collins, M., Zhu, Y., Papandreou, G., Zoph, B.,
Schroff, F., Adam, H., Shlens, J.: Searching for efficient
multi-scale architectures for dense image prediction. In:
Advances in Neural Information Processing Systems.
pp. 8699–8710 (2018)
Yang, M., Yu, K., Zhang, C., Li, Z., Yang, K.: Denseaspp
for semantic segmentation in street scenes. In:
Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition. pp. 3684–3692 (2018)
Havaei, M., Davy, A., Warde-Farley, D., Biard, A.,
Courville, A., Bengio, Y., Pal, C., Jodoin, P.M.,
Larochelle, H.: Brain tumor segmentation with deep
neural networks. Medical image analysis 35, 18–31
(2017)
Ji, X., Li, Y., Cheng, J., Yu, Y., Wang, M.: Cell image
segmentation based on an improved watershed
algorithm. In: 2015 8th International Congress on
Image and Signal Processing. pp. 433–437. (2015)
Ryota, I. and Kazuhiro, H.: Feature Sharing Cooperative
Network for Semantic Segmentation. In: Proceedings
of the 16th International Joint Conference on Computer
Vision, Imaging and Computer Graphics Theory and
Applications, pp. 577-584. (2021)
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler,
M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The
cityscapes dataset for semantic urban scene
understanding. In: Proceedings of the IEEE conference
on Computer Vision and Pattern Recognition. pp.
3213–3223 (2016)