
IEEE/CVF conference on computer vision and pattern
recognition, pages 8126–8135.
Danelljan, M., Bhat, G., Shahbaz Khan, F., and Felsberg,
M. (2017). The need for speed: A benchmark for
visual object tracking. In Proceedings of the IEEE
International Conference on Computer Vision, pages
1125–1134.
Du, D., Qi, Y., Yu, H., Yang, Y.-F., Duan, K., Li, G., Zhang,
W., Huang, Q., and Tian, Q. (2018). The unmanned
aerial vehicle benchmark: Object detection and track-
ing. In European Conference on Computer Vision
(ECCV).
Fabbri, M., Bras
´
o, G., Maugeri, G., Cetintas, O., Gas-
parini, R., O
ˇ
sep, A., Calderara, S., Leal-Taix
´
e, L., and
Cucchiara, R. (2021). Motsynth: How can synthetic
data help pedestrian detection and tracking? In 2021
IEEE/CVF International Conference on Computer Vi-
sion (ICCV), pages 10829–10839.
Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., Bai,
H., Xu, Y., Liao, C., and Ling, H. (2019). Lasot: A
high-quality benchmark for large-scale single object
tracking. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition (CVPR).
Fan, H. and Ling, H. (2019). Lasot: A high-quality bench-
mark for large-scale single object tracking. Proceed-
ings of the IEEE Conference on Computer Vision and
Pattern Recognition, pages 5374–5383.
Hu, J.-S., Juan, C.-W., and Wang, J.-J. (2008). A spatial-
color mean-shift object tracking algorithm with scale
and orientation estimation. Pattern Recognition Let-
ters, 29(16):2165–2173.
Hu, Y., Fang, S., Xie, W., and Chen, S. (2023). Aerial
monocular 3d object detection. IEEE Robotics and
Automation Letters, 8(4):1959–1966.
Huang, L., Zhao, X., and Huang, K. (2019). Got-
10k: A large high-diversity benchmark for generic
object tracking in the wild. arXiv preprint
arXiv:1810.11981.
Huang, L., Zhao, X., and Huang, K. (2021). GOT-10k:
a large high-diversity benchmark for generic object
tracking in the wild. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 43(5):1562–1577.
Kristan, M., Leonardis, A., Matas, J., Felsberg, M.,
Pflugfelder, R., Cehovin Zajc, L., Vojir, T., Hager, G.,
Lukezic, A., Eldesokey, A., et al. (2018). The sixth
visual object tracking vot2018 challenge results. In
European Conference on Computer Vision, pages 0–
0.
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., and Yan,
J. (2019). SiamRPN++: Evolution of Siamese visual
tracking with very deep networks. In the IEEE/CVF
Conference on Computer Vision and Pattern Recogni-
tion (CVPR), pages 4282–4291.
Li, B., Yan, J., Wu, W., Zhu, Z., and Hu, X. (2018). High
performance visual tracking with Siamese region pro-
posal network. In the IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition (CVPR), pages
8971–8980.
Li, S. and Yeung, D.-Y. (2017). Visual object tracking for
unmanned aerial vehicles: A benchmark and new mo-
tion models. Proceedings of the AAAI Conference on
Artificial Intelligence, 31(1).
Martinez-Gonzalez, P., Oprea, S., Garcia-Garcia, A.,
Jover-Alvarez, A., Orts-Escolano, S., and Garcia-
Rodriguez, J. (2020). Unrealrox: an extremely photo-
realistic virtual reality environment for robotics simu-
lations and synthetic data generation. Virtual Reality,
24:271–288.
Nam, H. and Han, B. (2016). Learning multi-domain con-
volutional neural networks for visual tracking. Pro-
ceedings of the IEEE conference on computer vision
and pattern recognition, pages 4293–4302.
Patel, H. A. and Thakore, D. G. (2013). Moving object
tracking using kalman filter. International Journal of
Computer Science and Mobile Computing, 2(4):326–
332.
Qiu, W., Zhong, F., Zhang, Y., Qiao, S., Xiao, Z., Kim,
T. S., and Wang, Y. (2017). Unrealcv: Virtual worlds
for computer vision. In Proceedings of the 25th ACM
International Conference on Multimedia, MM ’17,
page 1221–1224, New York, NY, USA. Association
for Computing Machinery.
Shah, S., Dey, D., Lovett, C., and Kapoor, A. (2018). Air-
sim: High-fidelity visual and physical simulation for
autonomous vehicles. In Hutter, M. and Siegwart, R.,
editors, Field and Service Robotics, pages 621–635,
Cham. Springer International Publishing.
Wang, Q., Gao, J., Lin, W., and Yuan, Y. (2019a). Learning
from synthetic data for crowd counting in the wild. In
2019 IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR), pages 8190–8199.
Wang, Q., Gao, J., Lin, W., and Yuan, Y. (2021). Pixel-wise
crowd understanding via synthetic data. International
Journal of Computer Vision, 129(1):225–245.
Wang, Q., Zhang, L., Bertinetto, L., Hu, W., and Torr,
P. H. (2019b). Fast online object tracking and seg-
mentation: a unifying approach. In Proceedings of
the IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR), pages 1328–1338.
Weng, S.-K., Kuo, C.-M., and Tu, S.-K. (2006). Video
object tracking using adaptive kalman filter. Journal
of Visual Communication and Image Representation,
17(6):1190–1208.
Wu, Y., Lim, J., and Yang, M.-H. (2015). Object tracking
benchmark. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 37(9):1834–1848.
Yan, B., Peng, H., Fu, J., Wang, D., and Lu, H.
(2021). Learning spatio-temporal Transformer for vi-
sual tracking. In Proceedings of the IEEE/CVF In-
ternational Conference on Computer Vision (ICCV),
pages 10448–10457.
Yilmaz, A., Javed, O., and Shah, M. (2006). Object track-
ing: A survey. ACM computing surveys (CSUR),
38(4):13.
Zhou, H., Yuan, Y., and Shi, C. (2009). Object tracking
using sift features and mean shift. Computer vision
and image understanding, 113(3):345–352.
VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications
750