
ing: Methods, datasets, and challenges. IEEE Intelli-
gent Transportation Systems Magazine.
Harding, J., Powell, G., Yoon, R., Fikentscher, J., Doyle,
C., Sade, D., Lukuc, M., Simons, J., and Wang, J.
(2014). Vehicle-to-vehicle communications: Readi-
ness of V2V technology for application. Technical Re-
port DOT HS 812 014, United States National High-
way Traffic Safety Administration.
Hu, Y., Lu, Y., Xu, R., Xie, W., Chen, S., and Wang, Y.
(2023). Collaboration helps camera overtake lidar in
3D detection. In Proceedings of the IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 9243–9252. IEEE.
Huang, T., Liu, J., Zhou, X., Nguyen, D. C., Azghadi,
M. R., Xia, Y., Han, Q.-L., and Sun, S. (2023).
V2X cooperative perception for autonomous driv-
ing: Recent advances and challenges. arXiv preprint
arXiv:2310.03525.
Keen, H. E. and Berns, K. (2020). Generation of elevation
maps for planning and navigation of vehicles in rough
natural terrain. In Berns, K. and G
¨
orges, D., editors,
Advances in Service and Industrial Robotics, pages
488–495, Cham. Springer International Publishing.
Keen, H. E. and Berns, K. (2023). Probabilistic fusion
of surface and underwater maps in a shallow water
environment. In Petri
ˇ
c, T., Ude, A., and
ˇ
Zlajpah, L.,
editors, Advances in Service and Industrial Robotics,
pages 195–202, Cham. Springer Nature Switzerland.
Kenney, J. B. (2011). Dedicated short-range communica-
tions (DSRC) standards in the united states. Proceed-
ings of the IEEE, 99(7):1162–1182.
Li, Y., Zhang, J., Ma, D., Wang, Y., and Feng, C. (2022).
Multi-robot scene completion: Towards task-agnostic
collaborative perception. In Conference on Robot
Learning (CoRL). PMLR.
Liang, M., Yang, B., Zeng, W., Chen, Y., Hu, R., Casas,
S., and Urtasun, R. (2020). PnPNet: End-to-end per-
ception and prediction with tracking in the loop. In
Proceedings of the IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition (CVPR), pages
11553–11562. IEEE.
Liu, H., Gu, Z., Wang, C., Wang, P., and Vukobratovic, D.
(2023). A lidar semantic segmentation framework for
the cooperative vehicle-infrastructure system. In Pro-
ceedings of the 2023 IEEE 98th Vehicular Technology
Conference (VTC2023-Fall), pages 1–5. IEEE.
Liu, Y., Sun, B., Li, Y., Hu, Y., and Wang, F.-Y. (2024).
HPL-ViT: A unified perception framework for hetero-
geneous parallel lidars in V2V. In Proceedings of the
2024 IEEE International Conference on Robotics and
Automation (ICRA). IEEE.
Marez, D., Nans, L., and Borden, S. (2022). Bandwidth
constrained cooperative object detection in images. In
Artificial Intelligence and Machine Learning in De-
fense Applications IV, volume 12276, pages 128–140.
SPIE.
Ochieng, W. and Sauer, K. (2002). Urban road trans-
port navigation: Performance of the global position-
ing system after selective availability. Transportation
Research Part C: Emerging Technologies, 10(3):171–
187.
Peri, N., Luiten, J., Li, M., Osep, A., Leal-Taix
´
e, L., and
Ramanan, D. (2022). Forecasting from lidar via fu-
ture object detection. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recogni-
tion (CVPR), pages 17202–17211. IEEE.
Triess, L. T., Dreissig, M., Rist, C. B., and Z
¨
ollner, J. M.
(2021). A survey on deep domain adaptation for lidar
perception. In 2021 IEEE Intelligent Vehicles Sym-
posium Workshops (IV Workshops), pages 350–357.
IEEE.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, L., and Polosukhin, I.
(2017). Attention is all you need. In Advances in
Neural Information Processing Systems, volume 30.
Wang, B., Zhang, L., Wang, Z., Zhao, Y., and Zhou, T.
(2023a). CoRe: Cooperative reconstruction for multi-
agent perception. In Proceedings of the IEEE/CVF In-
ternational Conference on Computer Vision (ICCV),
pages 8676–8686. IEEE.
Wang, T., Chen, G., Chen, K., Liu, Z., Zhang, B., Knoll, A.,
and Jiang, C. (2023b). UMC: A unified bandwidth-
efficient and multi-resolution based collaborative per-
ception framework. In Proceedings of the IEEE/CVF
International Conference on Computer Vision (ICCV),
pages 8153–8162. IEEE.
Wang, T., Kim, S., Jiang, W., Xie, E., Ge, C., Chen, J., Li,
Z., and Luo, P. (2024). DeepAccident: A motion and
accident prediction benchmark for V2X autonomous
driving. Proceedings of the AAAI Conference on Arti-
ficial Intelligence, 38(6):5599–5606.
Wang, T.-H., Manivasagam, S., Liang, M., Yang, B., Zeng,
W., and Urtasun, R. (2020). V2VNet: Vehicle-to-
vehicle communication for joint perception and pre-
diction. In Vedaldi, A., Bischof, H., Brox, T., and
Frahm, J.-M., editors, Computer Vision – ECCV 2020,
volume 12347, pages 605–621. Springer International
Publishing.
Xiang, L., Yin, J., Li, W., Xu, C.-Z., Yang, R., and Shen,
J. (2023). DI-V2X: Learning domain-invariant rep-
resentation for vehicle-infrastructure collaborative 3D
object detection. In Proceedings of the AAAI Confer-
ence on Artificial Intelligence. AAAI.
Xu, Y., Chambon, L., Zablocki,
´
E., Chen, M., Alahi, A.,
Cord, M., and P
´
erez, P. (2023). Towards motion fore-
casting with real-world perception inputs: Are end-
to-end approaches competitive? In 2024 IEEE In-
ternational Conference on Robotics and Automation
(ICRA), pages 18428–18435. IEEE.
Yu, H., Yang, W., Ruan, H., Yang, Z., Tang, Y., Gao, X.,
Hao, X., Shi, Y., Pan, Y., Sun, N., Song, J., Yuan, J.,
Luo, P., and Nie, Z. (2023). V2X-Seq: A large-scale
sequential dataset for vehicle-infrastructure coopera-
tive perception and forecasting. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition (CVPR), pages 5486–5495. IEEE.
Zhou, Y. and Tuzel, O. (2018). VoxelNet: End-to-end learn-
ing for point cloud based 3D object detection. In Pro-
ceedings of the IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), pages 4490–4499.
IEEE.
The Components of Collaborative Joint Perception and Prediction: A Conceptual Framework
465