ject detection and semantic segmentation. CoRR,
abs/1311.2524.
He, K., Gkioxari, G., Dollár, P., and Girshick, R. B. (2017).
Mask R-CNN. In IEEE International Conference on
Computer Vision, ICCV 2017, Venice, Italy, October
22-29, 2017, pages 2980–2988. IEEE Computer Soci-
ety.
Jin, B., Hu, Y., Tang, Q., Niu, J., Shi, Z., Han, Y., and Li, X.
(2020). Exploring spatial-temporal multi-frequency
analysis for high-fidelity and temporal-consistency
video prediction. In 2020 IEEE/CVF Conference
on Computer Vision and Pattern Recognition, CVPR
2020, Seattle, WA, USA, June 13-19, 2020, pages
4553–4562. Computer Vision Foundation / IEEE.
Lee, S., Kim, H. G., Choi, D. H., Kim, H., and Ro, Y. M.
(2021). Video prediction recalling long-term motion
context via memory alignment learning. In IEEE Con-
ference on Computer Vision and Pattern Recognition,
CVPR 2021, virtual, June 19-25, 2021, pages 3054–
3063. Computer Vision Foundation / IEEE.
Li, M., Wang, Y., and Ramanan, D. (2020). Towards
streaming perception. In Vedaldi, A., Bischof, H.,
Brox, T., and Frahm, J., editors, Computer Vision -
ECCV 2020 - 16th European Conference, Glasgow,
UK, August 23-28, 2020, Proceedings, Part II, volume
12347 of Lecture Notes in Computer Science, pages
473–488. Springer.
Li, Z., Wang, W., Li, H., Xie, E., Sima, C., Lu, T.,
Qiao, Y., and Dai, J. (2022). Bevformer: Learning
bird’s-eye-view representation from multi-camera im-
ages via spatiotemporal transformers. arXiv preprint
arXiv:2203.17270.
Lin, T., Dollár, P., Girshick, R. B., He, K., Hariharan, B.,
and Belongie, S. J. (2016). Feature pyramid networks
for object detection. CoRR, abs/1612.03144.
Lin, T., Goyal, P., Girshick, R. B., He, K., and Dollár, P.
(2017). Focal loss for dense object detection. CoRR,
abs/1708.02002.
Lin, T., Maire, M., Belongie, S. J., Bourdev, L. D., Girshick,
R. B., Hays, J., Perona, P., Ramanan, D., Dollár, P.,
and Zitnick, C. L. (2014). Microsoft COCO: common
objects in context. CoRR, abs/1405.0312.
Liu, Z., Tang, H., Amini, A., Yang, X., Mao, H., Rus, D.,
and Han, S. (2022). Bevfusion: Multi-task multi-
sensor fusion with unified bird’s-eye view represen-
tation. arXiv.
Lobanov, M. and Sholomov, D. (2021). Application of
shared backbone dnns in adas perception systems.
Proceedings of SPIE - The International Society for
Optical Engineering, 11605.
Luc, P., Neverova, N., Couprie, C., Verbeek, J., and LeCun,
Y. (2017). Predicting deeper into the future of seman-
tic segmentation. In IEEE International Conference
on Computer Vision, ICCV 2017, Venice, Italy, Oc-
tober 22-29, 2017, pages 648–657. IEEE Computer
Society.
Ma, Y., Liu, S., Li, Z., and Sun, J. (2021). Iqdet: Instance-
wise quality distribution sampling for object detec-
tion. In IEEE Conference on Computer Vision and
Pattern Recognition, CVPR 2021, virtual, June 19-25,
2021, pages 1717–1725. Computer Vision Foundation
/ IEEE.
Ramachandran, P., Zoph, B., and Le, Q. V. (2017). Search-
ing for activation functions. CoRR, abs/1710.05941.
Redmon, J. and Farhadi, A. (2018). Yolov3: An incremental
improvement. arXiv.
Wang, Y., Guizilini, V., Zhang, T., Wang, Y., Zhao, H.,
and Solomon, J. (2021). DETR3D: 3d object de-
tection from multi-view images via 3d-to-2d queries.
In Faust, A., Hsu, D., and Neumann, G., editors,
Conference on Robot Learning, 8-11 November 2021,
London, UK, volume 164 of Proceedings of Machine
Learning Research, pages 180–191. PMLR.
Wang, Y., Wu, J., Long, M., and Tenenbaum, J. B. (2020).
Probabilistic video prediction from noisy data with a
posterior confidence. In 2020 IEEE/CVF Conference
on Computer Vision and Pattern Recognition, CVPR
2020, Seattle, WA, USA, June 13-19, 2020, pages
10827–10836. Computer Vision Foundation / IEEE.
Yang, J., Liu, S., Li, Z., Li, X., and Sun, J. (2022). Real-time
object detection for streaming perception. In Proceed-
ings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition (CVPR), pages 5385–5395.
Zhu, X., Wang, Y., Dai, J., Yuan, L., and Wei, Y. (2017).
Flow-guided feature aggregation for video object de-
tection. In IEEE International Conference on Com-
puter Vision, ICCV 2017, Venice, Italy, October 22-
29, 2017, pages 408–417. IEEE Computer Society.
Zilberstein, S. (1996). Using anytime algorithms in intelli-
gent systems. AI Mag., 17(3):73–83.
VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications
46