Society Conference on Computer Vision and Pattern
Recognition, pages 2579–2586.
Fire, A. and Zhu, S. C. (2015). Learning perceptual causal-
ity from video. ACM Transactions on Intelligent Sys-
tems and Technology, 7(2).
Fire, A. and Zhu, S.-C. (2017). Inferring hidden statuses
and actions in video by causal reasoning. In Proceed-
ings of the IEEE Conference on Computer Vision and
Pattern Recognition Workshops, pages 48–56.
Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE
international conference on computer vision, pages
1440–1448.
He, K., Gkioxari, G., Doll
´
ar, P., and Girshick, R. (2017).
Mask r-cnn. In Proceedings of the IEEE international
conference on computer vision, pages 2961–2969.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 770–778.
Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger,
K. Q. (2017). Densely connected convolutional net-
works. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 4700–
4708.
Isola, P., Lim, J. J., and Adelson, E. H. (2015). Discovering
states and transformations in image collections. Pro-
ceedings of the IEEE Computer Society Conference
on Computer Vision and Pattern Recognition, 07-12-
June:1383–1391.
Lin, T.-Y., Doll
´
ar, P., Girshick, R., He, K., Hariharan, B.,
and Belongie, S. (2017a). Feature pyramid networks
for object detection. In Proceedings of the IEEE con-
ference on computer vision and pattern recognition,
pages 2117–2125.
Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll
´
ar, P.
(2017b). Focal loss for dense object detection. In
Proceedings of the IEEE international conference on
computer vision, pages 2980–2988.
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P.,
Ramanan, D., Doll
´
ar, P., and Zitnick, C. L. (2014).
Microsoft coco: Common objects in context. In Euro-
pean conference on computer vision, pages 740–755.
Springer.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.,
Fu, C.-Y., and Berg, A. C. (2016). Ssd: Single shot
multibox detector. In European conference on com-
puter vision, pages 21–37. Springer.
Liu, Y., Wei, P., and Zhu, S. C. (2017). Jointly Recogniz-
ing Object Fluents and Tasks in Egocentric Videos.
Proceedings of the IEEE International Conference on
Computer Vision, 2017-Octob:2943–2951.
Mahdisoltani, F., Berger, G., Gharbieh, W., Fleet, D.,
and Memisevic, R. (2018). On the effectiveness of
task granularity for transfer learning. arXiv preprint
arXiv:1804.09235.
Padilla, R., Netto, S. L., and da Silva, E. A. (2020). A sur-
vey on performance metrics for object-detection algo-
rithms. In 2020 International Conference on Systems,
Signals and Image Processing (IWSSIP), pages 237–
242. IEEE.
Redmon, J. and Farhadi, A. (2017). Yolo9000: better, faster,
stronger. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 7263–
7271.
Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster
r-cnn: Towards real-time object detection with region
proposal networks. Advances in neural information
processing systems, 28:91–99.
Wang, X., Farhadi, A., and Gupta, A. (2016). Actions˜
transformations. In Proceedings of the IEEE con-
ference on Computer Vision and Pattern Recognition,
pages 2658–2667.
VISAPP 2022 - 17th International Conference on Computer Vision Theory and Applications
600