
intelligent robots and systems (IROS), pages 922–928.
IEEE.
Nguyen, A. D., Pham, H. H., Trung, H. T., Nguyen, Q.
V. H., Truong, T. N., and Nguyen, P. L. (2023). High
accurate and explainable multi-pill detection frame-
work with graph neural network-assisted multimodal
data fusion. Plos one, 18(9):e0291865.
Ouardirhi, Z., Mahmoudi, S. A., Zbakh, M., El Ghmary,
M., Benjelloun, M., Abdelali, H. A., and Derrouz, H.
(2022). An efficient real-time moroccan automatic
license plate recognition system based on the yolo
object detector. In International Conference On Big
Data and Internet of Things, pages 290–302. Springer.
Ouyang, W., Wang, X., Zeng, X., Qiu, S., Luo, P., Tian,
Y., Li, H., Yang, S., Wang, Z., Loy, C.-C., et al.
(2015). Deepid-net: Deformable deep convolutional
neural networks for object detection. In Proceedings
of the IEEE conference on computer vision and pat-
tern recognition, pages 2403–2412.
Pandya, S., Srivastava, G., Jhaveri, R., Babu, M. R., Bhat-
tacharya, S., Maddikunta, P. K. R., Mastorakis, S.,
Piran, M. J., and Gadekallu, T. R. (2023). Feder-
ated learning for smart cities: A comprehensive sur-
vey. Sustainable Energy Technologies and Assess-
ments, 55:102987.
Qi, C. R., Yi, L., Su, H., and Guibas, L. J. (2017). Point-
net++: Deep hierarchical feature learning on point sets
in a metric space. Advances in neural information pro-
cessing systems, 30.
Radosavovic, I., Kosaraju, R. P., Girshick, R., He, K., and
Doll
´
ar, P. (2020). Designing network design spaces.
In Proceedings of the IEEE/CVF conference on com-
puter vision and pattern recognition, pages 10428–
10436.
Ramachandram, D. and Taylor, G. W. (2017). Deep mul-
timodal learning: A survey on recent advances and
trends. IEEE signal processing magazine, 34(6):96–
108.
RODRIGUES, L. S., Sakiyama, K., Takashi Matsubara, E.,
Marcato Junior, J., and Gonc¸alves, W. N. Multimodal
fusion based on arithmetic operations and attention
mechanisms. Available at SSRN 4292754.
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and
Chen, L.-C. (2018). Mobilenetv2: Inverted residu-
als and linear bottlenecks. In Proceedings of the IEEE
conference on computer vision and pattern recogni-
tion, pages 4510–4520.
Sharma, P., Gupta, S., Vyas, S., and Shabaz, M. (2023).
Retracted: Object detection and recognition using
deep learning-based techniques. IET Communica-
tions, 17(13):1589–1599.
Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X.,
and Li, H. (2020). Pv-rcnn: Point-voxel feature set
abstraction for 3d object detection. In Proceedings
of the IEEE/CVF conference on computer vision and
pattern recognition, pages 10529–10538.
Simonyan, K. and Zisserman, A. (2014). Very deep con-
volutional networks for large-scale image recognition.
arXiv preprint arXiv:1409.1556.
Sozzi, M., Cantalamessa, S., Cogato, A., Kayad, A.,
and Marinello, F. (2022). Automatic bunch detec-
tion in white grape varieties using yolov3, yolov4,
and yolov5 deep learning algorithms. Agronomy,
12(2):319.
Steyaert, S., Pizurica, M., Nagaraj, D., Khandelwal, P.,
Hernandez-Boussard, T., Gentles, A. J., and Gevaert,
O. (2023). Multimodal data fusion for cancer
biomarker discovery with deep learning. Nature Ma-
chine Intelligence, 5(4):351–362.
Wang, C., Ma, C., Zhu, M., and Yang, X. (2021). Pointaug-
menting: Cross-modal augmentation for 3d object de-
tection. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pages
11794–11803.
Wang, C.-Y., Bochkovskiy, A., and Liao, H.-Y. M. (2023).
Yolov7: Trainable bag-of-freebies sets new state-of-
the-art for real-time object detectors. In Proceedings
of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pages 7464–7475.
Xiang, Y., Mottaghi, R., and Savarese, S. (2014). Beyond
pascal: A benchmark for 3d object detection in the
wild. In IEEE winter conference on applications of
computer vision, pages 75–82. IEEE.
Yan, Y., Mao, Y., and Li, B. (2018). Second:
Sparsely embedded convolutional detection. Sensors,
18(10):3337.
Yang, C., Ablavsky, V., Wang, K., Feng, Q., and Betke,
M. (2020). Learning to separate: Detecting heavily-
occluded objects in urban scenes. In European Con-
ference on Computer Vision, pages 530–546. Springer.
Ye, D., Zhou, Z., Chen, W., Xie, Y., Wang, Y., Wang, P., and
Foroosh, H. (2023a). Lidarmultinet: Towards a uni-
fied multi-task network for lidar perception. In Pro-
ceedings of the AAAI Conference on Artificial Intelli-
gence, volume 37, pages 3231–3240.
Ye, H., Zhao, J., Pan, Y., Cherr, W., He, L., and Zhang,
H. (2023b). Robot person following under partial oc-
clusion. In 2023 IEEE International Conference on
Robotics and Automation (ICRA), pages 7591–7597.
IEEE.
Yun, S., Han, D., Oh, S. J., Chun, S., Choe, J., and Yoo,
Y. (2019). Cutmix: Regularization strategy to train
strong classifiers with localizable features. In Pro-
ceedings of the IEEE/CVF international conference
on computer vision, pages 6023–6032.
Zadeh, A., Chen, M., Poria, S., Cambria, E., and Morency,
L.-P. (2017). Tensor fusion network for multimodal
sentiment analysis. arXiv preprint arXiv:1707.07250.
Zhang, J., Wang, J., Xu, D., and Li, Y. (2021). Hcnet: a
point cloud object detection network based on height
and channel attention. Remote Sensing, 13(24):5071.
Zhang, S., Benenson, R., and Schiele, B. (2017). Cityper-
sons: A diverse dataset for pedestrian detection. In
Proceedings of the IEEE conference on computer vi-
sion and pattern recognition, pages 3213–3221.
Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017).
Pyramid scene parsing network. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 2881–2890.
Zhu, X., Ma, Y., Wang, T., Xu, Y., Shi, J., and Lin, D.
(2020). Ssn: Shape signature networks for multi-
class object detection from point clouds. In Com-
puter Vision–ECCV 2020: 16th European Confer-
ence, Glasgow, UK, August 23–28, 2020, Proceed-
ings, Part XXV 16, pages 581–597. Springer.
FuDensityNet: Fusion-Based Density-Enhanced Network for Occlusion Handling
639