
for video instance segmentation. In IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition,
pages 14623–14632.
Heo, M., Hwang, S., Oh, S. W., Lee, J.-Y., and Kim, S. J.
(2022). VITA: Video instance segmentation with tem-
poral attention. In Advances in Neural Information
Processing Systems, volume 35, pages 23109–23120.
Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A.,
and Brox, T. (2017). FlowNet 2.0: Evolution of opti-
cal flow estimation with deep networks. In IEEE Con-
ference on Computer Vision and Pattern Recognition,
pages 2462–2470.
ITU-T (2021). Recommendation h.264 (08/21). accessed
Jul 2024.
Kale, K., Pawar, S., and Dhulekar, P. (2015). Moving object
tracking using optical flow and motion vector estima-
tion. In 4
th
International Conference on Reliability,
Infocom Technologies and Optimization, page #108.
Kamble, P. R., Keskar, A. G., and Bhurchandi, K. M.
(2019). Ball tracking in sports: a survey. Artificial
Intelligence Review, 52:1655–1705.
Ke, L., Danelljan, M., Ding, H., Tai, Y.-W., Tang, C.-K., and
Yu, F. (2023). Mask-free video instance segmenta-
tion. In 2023 IEEE/CVF Conference on Computer Vi-
sion and Pattern Recognition (CVPR), pages 22857–
22866.
Khaustov, V. and Mozgovoy, M. (2020). Recognizing
events in spatiotemporal soccer data. Applied Sci-
ences, 10(22):8046.
Kirillov, A., He, K., Girshick, R., and Doll
´
ar, P. (2019).
Panoptic segmentation. In IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pages
9404–9413.
Lin, T.-Y., Doll
´
ar, P., Girshick, R., He, K., Hariharan, B.,
and Belongie, S. (2017). Feature pyramid networks
for object detection. In Proceedings of the IEEE con-
ference on computer vision and pattern recognition,
pages 2117–2125.
Linke, D., Link, D., and Lames, M. (2020). Football-
specific validity of tracab’s optical video tracking sys-
tems. PloS one, 15(3):#e0230179.
Liu, J., Huang, G., Hyypp
¨
a, J., Li, J., Gong, X., and Jiang,
X. (2023). A survey on location and motion track-
ing technologies, methodologies and applications in
precision sports. Expert Systems with Applications,
229:120492.
Majeed, F., Gilal, N. U., Al-Thelaya, K., Yang, Y., Agus,
M., and Schneider, J. (2024). MV-Soccer: Motion-
vector augmented instance segmentation for soccer
player tracking. In IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition Workshops,
pages 3245–3255.
Manafifard, M., Ebadi, H., and Moghaddam, H. A. (2017).
A survey on player tracking in soccer videos. Com-
puter Vision and Image Understanding, 159:19–46.
Mazzeo, P. L., Spagnolo, P., Leo, M., and D’Orazio, T.
(2008). Visual players detection and tracking in soc-
cer matches. In IEEE 5
th
International Conference on
Advanced Video and Signal Based Surveillance, pages
326–333.
Murr, D., Raabe, J., and H
¨
oner, O. (2018). The prognostic
value of physiological and physical characteristics in
youth soccer: A systematic review. European journal
of sport science, 18(1):62–74.
Naik, B. T. and Hashmi, M. F. (2023). Yolov3-sort: detec-
tion and tracking player/ball in soccer sport. Journal
of Electronic Imaging, 32(1):011003–011003.
Naik, B. T., Hashmi, M. F., Geem, Z. W., and Bodke, N. D.
(2022). DeepPlayer-Track: Player and referee track-
ing with jersey color recognition in soccer. IEEE Ac-
cess, 10:32494–32509.
Qin, Z., Zhou, S., Wang, L., Duan, J., Hua, G., and Tang,
W. (2023). MotionTrack: Learning robust short-term
and long-term motions for multi-object tracking. In
IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition, pages 17939–17948.
Ranjan, A. and Black, M. J. (2017). Optical flow estimation
using a spatial pyramid network. In IEEE Conference
on Computer Vision and Pattern Recognition, pages
4161–4170.
Rudovic, O., Lee, J., Dai, M., Schuller, B., and Picard,
R. W. (2018). Personalized machine learning for robot
perception of affect and engagement in autism ther-
apy. Science Robotics, 3(19):#eaao6760.
Schar, H. (2000). Optimale Operatoren in der Digitalen
Bildverarbeitung. PhD thesis, University of Heidel-
berg. (in German).
Shah, S. T. H., Xuezhi, X., and Ahmed, W. (2021). Optical
flow estimation with convolutional neural nets. Pat-
tern Recognition and Image Analysis, 31:656–670.
Stauffer, C. and Grimson, W. E. L. (1999). Adaptive back-
ground mixture models for real-time tracking. In
IEEE Conference on Computer Vision and Pattern
Recognition, pages 246–252.
Teed, Z. and Deng, J. (2020). Raft: Recurrent all-pairs
field transforms for optical flow. In Computer Vision–
ECCV 2020: 16th European Conference, Glasgow,
UK, August 23–28, 2020, Proceedings, Part II 16,
pages 402–419. Springer.
Wehbe, G. M., Hartwig, T. B., and Duncan, C. S. (2014).
Movement analysis of australian national league soc-
cer players using global positioning system technol-
ogy. The Journal of Strength & Conditioning Re-
search, 28(3):834–842.
Xu, R., Tabman, D., and Naman, A. T. (2016). Motion
estimation based on mutual information and adaptive
multi-scale thresholding. IEEE Transactions on Image
Processing, 25(33):1095–1108.
Yang, L., Fan, Y., and Xu, N. (2019). Video instance seg-
mentation. In 2019 IEEE/CVF International Confer-
ence on Computer Vision (ICCV), pages 5187–5196.
Yilmaz, A., Javed, O., and Shah, M. (2006). Object track-
ing: A survey. ACM Computing Surveys, 38(4):#13–
es.
Zhang, Y., Sun, P., Jiang, Y., Yu, D., Weng, F., Yuan, Z.,
Luo, P., Liu, W., and Wang, X. (2022). ByteTrack:
Multi-object tracking by associating every detection
box. In European Conference on Computer, pages 1–
21.
Zhang, Z. (2012). Microsoft Kinect sensor and its effect.
IEEE Multimedia, 19(2):4–10.
Zivkovic, Z. (2004). Improved adaptive Gaussian mixture
model for background subtraction. In International
Conference on Pattern Recognition, volume 4, pages
28–31.
ReST: High-Precision Soccer Player Tracking via Motion Vector Segmentation
149