Dendorfer, P., O
ˇ
sep, A., Milan, A., Schindler, K., Cremers,
D., Reid, I., Roth, S., and Leal-Taix
´
e, L. (2020a).
Motchallenge: A benchmark for single-camera multi-
ple target tracking. arXiv preprint arXiv:2010.07548.
Dendorfer, P., Rezatofighi, H., Milan, A., Shi, J., Cremers,
D., Reid, I., Roth, S., Schindler, K., and Leal-Taix
´
e,
L. (2020b). Mot20: A benchmark for multi object
tracking in crowded scenes. arXiv:2003.09003[cs].
arXiv: 2003.09003.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-
Fei, L. (2009). Imagenet: A large-scale hierarchical
image database. In 2009 IEEE conference on com-
puter vision and pattern recognition, pages 248–255.
Ieee.
Fang, K., Xiang, Y., Li, X., and Savarese, S. (2018). Recur-
rent autoregressive networks for online multi-object
tracking. In 2018 IEEE Winter Conference on Appli-
cations of Computer Vision (WACV), pages 466–475.
IEEE.
Farhadi, J. R. A. and Redmon, J. (2018). Yolov3: An incre-
mental improvement. Retrieved September, 17:2018.
Gao, J. and Nevatia, R. (2018). Revisiting temporal mod-
eling for video-based person reid. arXiv preprint
arXiv:1805.02104.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 770–778.
Hermans, A., Beyer, L., and Leibe, B. (2017). In defense
of the triplet loss for person re-identification. arXiv
preprint arXiv:1703.07737.
Kuhn, H. W. (1955). The hungarian method for the as-
signment problem. Naval research logistics quarterly,
2(1-2):83–97.
Leal-Taix
´
e, L., Milan, A., Reid, I., Roth, S., and Schindler,
K. (2015). MOTChallenge 2015: Towards a bench-
mark for multi-target tracking. arXiv:1504.01942
[cs]. arXiv: 1504.01942.
Liu, Y., Yan, J., and Ouyang, W. (2017). Quality aware
network for set to set recognition. In Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition, pages 5790–5799.
Luo, H., Gu, Y., Liao, X., Lai, S., and Jiang, W. (2019).
Bag of tricks and a strong baseline for deep person re-
identification. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition Work-
shops, pages 0–0.
Mahmoudi, N., Ahadi, S. M., and Rahmati, M. (2019).
Multi-target tracking using cnn-based features: Cnn-
mtt. Multimedia Tools and Applications, 78(6):7077–
7096.
McLaughlin, N., Del Rincon, J. M., and Miller, P. (2016).
Recurrent convolutional network for video-based per-
son re-identification. In Proceedings of the IEEE con-
ference on computer vision and pattern recognition,
pages 1325–1334.
Milan, A., Leal-Taix
´
e, L., Reid, I., Roth, S., and Schindler,
K. (2016). MOT16: A benchmark for multi-object
tracking. arXiv:1603.00831 [cs]. arXiv: 1603.00831.
Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster
r-cnn: Towards real-time object detection with region
proposal networks. In Advances in neural information
processing systems, pages 91–99.
Sun, Y., Zheng, L., Yang, Y., Tian, Q., and Wang, S. (2018).
Beyond part models: Person retrieval with refined part
pooling (and a strong convolutional baseline). In Pro-
ceedings of the European Conference on Computer Vi-
sion (ECCV), pages 480–496.
Wang, Z., Zheng, L., Liu, Y., Li, Y., and Wang, S.
(2019). Towards real-time multi-object tracking.
arXiv preprint arXiv:1909.12605.
Welch, G., Bishop, G., et al. (1995). An introduction to the
kalman filter.
Wojke, N. and Bewley, A. (2018). Deep cosine metric learn-
ing for person re-identification. In 2018 IEEE Win-
ter Conference on Applications of Computer Vision
(WACV), pages 748–756. IEEE.
Wojke, N., Bewley, A., and Paulus, D. (2017). Simple on-
line and realtime tracking with a deep association met-
ric. In 2017 IEEE International Conference on Image
Processing (ICIP), pages 3645–3649. IEEE.
Woo, S., Park, J., Lee, J.-Y., and So Kweon, I. (2018).
Cbam: Convolutional block attention module. In Pro-
ceedings of the European conference on computer vi-
sion (ECCV), pages 3–19.
Yu, F., Li, W., Li, Q., Liu, Y., Shi, X., and Yan, J. (2016).
Poi: Multiple object tracking with high performance
detection and appearance feature. In European Con-
ference on Computer Vision (ECCV), pages 36–42.
Springer.
Zhan, Y., Wang, C., Wang, X., Zeng, W., and Liu, W.
(2020). A simple baseline for multi-object tracking.
arXiv preprint arXiv:2004.01888.
Zhang, Z., Lan, C., Zeng, W., Jin, X., and Chen, Z.
(2020). Relation-aware global attention for person re-
identification. In Proceedings of the IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition,
pages 3186–3195.
Zheng, L., Bie, Z., Sun, Y., Wang, J., Su, C., Wang, S., and
Tian, Q. (2016). Mars: A video benchmark for large-
scale person re-identification. In European Confer-
ence on Computer Vision, pages 868–884. Springer.
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., and Tian,
Q. (2015). Scalable person re-identification: A bench-
mark. In Proceedings of the IEEE international con-
ference on computer vision (ICCV), pages 1116–1124.
Zheng, Z., Zheng, L., and Yang, Y. (2017a). A discrim-
inatively learned cnn embedding for person reiden-
tification. ACM Transactions on Multimedia Com-
puting, Communications, and Applications (TOMM),
14(1):1–20.
Zheng, Z., Zheng, L., and Yang, Y. (2017b). Unlabeled
samples generated by gan improve the person re-
identification baseline in vitro. In Proceedings of the
IEEE International Conference on Computer Vision,
pages 3754–3762.
Zhou, K., Yang, Y., Cavallaro, A., and Xiang, T.
(2019). Omni-scale feature learning for person re-
identification. In Proceedings of the IEEE Interna-
tional Conference on Computer Vision (ICCV), pages
3702–3712.
Zhou, Z., Xing, J., Zhang, M., and Hu, W. (2018). On-
line multi-target tracking with tensor-based high-order
graph matching. In 2018 24th International Con-
ference on Pattern Recognition (ICPR), pages 1809–
1814. IEEE.
VISAPP 2021 - 16th International Conference on Computer Vision Theory and Applications
244