to out-perform the recent state-of-the- art tracker OC-
SORT by a small margin. The most balanced metric
HOTA, which gives ap propriate weight to both sub-
tasks: finding the individuals and consistently identi-
fying th em, still shows ro om for impr ovement under
challengin g conditions. The other metrics give evi-
dence, in which sub-task the contribution of the UBT
idea lies. The number of ide ntity switches is much
lower. This is indicating, that the correct and c onsis-
tent identification of tracked individuals benefits fr om
the re-connection to the closest inactive track, that is
introdu ces in the UBT in this work. Further research
should address the case when more than one indi-
vidual is gone from view. The reidentification cou ld
take into account past trajectories and appearances of
missing tracks to connect them once they reappear.
In the MOT sub-task of f ollowing and re-identifying
individuals in videos, that fulfill the requirement of
a known maximum number of individuals, U BT is a
good choice.
ACKNOWLEDGEMENTS
Funded by the Deutsche Forschungsgemeinschaft
(DFG, German Research Foundation) under Ger-
many’s E xcellence Strategy – EXC 2002/1 “Science
of I ntelligence” – project number 390523135.
We thank Clara Bekemeier and Sophia Meier for
manual data annotatio n and Benjamin Lan g for build-
ing the transparent cage lid.
REFERENCES
Bernardin, K. and Stiefelhagen, R. (2008). Evaluating mul-
tiple object tracking performance: the clear mot met-
rics. EURASIP Journal on Image and Video Process-
ing, 2008:1–10.
Bewley, A., Ge, Z., Ott, L., Ramos, F., and Upcroft, B.
(2016). Simple online and r ealt ime tracking. In 2016
IEEE international conference on image processing
(ICIP), pages 3464–3468. IE EE.
Bochinski, E., Senst, T., and Sikora, T. (2018). Extending
iou based multi-object tracking by visual information.
In 2018 15th IEEE International Conference on Ad-
vanced Video and Signal Based Surveillance (AVSS),
pages 1–6. IEEE.
Cao, J., Weng, X., Khirodkar, R., Pang, J., and Kitani,
K. (2022). Observation-centric sort: Rethinking
sort for robust multi-object tracking. arXiv preprint
arXiv:2203.14360.
Dendorfer, P., Rezatofighi, H ., Milan, A., Shi, J., Cremers,
D., Reid, I., Roth, S., Schindler, K., and Leal-Taix´e, L.
(2020). Mot20: A benchmark for multi object track-
ing in crowded scenes. arXiv:2003.09003[cs]. arXiv:
2003.09003.
FELASA Working Group on Revision of Guidelines for
Health Monitoring of Rodents and Rabbits, M¨ahler,
M., Berard, M., Feinstein, R. , Gallagher, A ., Illgen-
Wilcke, B., Pritchett-Corning, K., and Raspa, M.
(2014). Felasa recommendations for the health mon-
itoring of mouse, rat, hamster, guinea pig and rabbit
colonies in breeding and experimental units. Labora-
tory animals, 48(3):178–192.
Ge, Z., Liu, S., Wang, F., Li, Z., and Sun, J. (2021).
Yolox: Exceeding yolo series in 2021. arXiv preprint
arXiv:2107.08430.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 770–778.
Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M.,
and Schiele, B. (2016). Deepercut: A deeper, stronger,
and faster multi-person pose estimation model. In Eu-
ropean Conference on Computer Vision, pages 34–50.
Springer.
Kalman, R. E. ( 1960). A New Approach to Linear Filtering
and Prediction Problems. Journal of Basic Engineer-
ing, 82(1):35–45.
Lauer, J., Zhou, M., Ye, S., Menegas, W., Nath, T., Rahman,
M. M., Di Santo, V., Soberanes, D., Feng, G. , Murthy,
V. N., et al. (2021). Multi-animal pose estimation and
tracking with deeplabcut. bioRxiv.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.,
Fu, C.-Y., and Berg, A. C. (2016). Ssd: Single shot
multibox detector. In European conference on com-
puter vision, pages 21–37. Springer.
Luiten, J., Osep, A., Dendorfer, P., Torr, P., Geiger, A., Leal-
Taix´e, L., and Leibe, B. (2021). Hota: A higher order
metric for evaluating multi-object tracking. Interna-
tional journal of computer vision, 129(2):548–578.
Mathis, A., Mamidanna, P., Cury, K. M., Abe, T., Murthy,
V. N., Mathis, M. W., and Bethge, M. (2018).
Deeplabcut: markerless pose estimation of user-
defined body parts with deep learning. Nature Neu-
roscience, 21(9):1281–1289.
Newell, A., Yang, K., and Deng, J. (2016). Stacked hour-
glass networks for human pose estimation. In Euro-
pean conference on computer vision, pages 483–499.
Springer.
Pereira, T. D., Tabris, N., Matsliah, A., Turner, D. M., Li, J.,
Ravindranath, S., Papadoyannis, E. S., Normand, E.,
Deutsch, D. S., Wang, Z. Y., McKenzie-Smith, G. C.,
Mitelut, C. C., Castro, M. D., D’Uva, J., Kislin, M.,
Sanes, D. H., Kocher, S. D., Wang, S. S.-H., Falkner,
A. L., Shaevitz, J. W., and Murthy, M. (2022). Sleap:
A deep learning system for multi-animal pose track-
ing. Nature Methods.
Redmon, J., Divval a, S., Girshick, R., and Farhadi, A.
(2016). You only look once: Unified, real-time object
detection. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 779–
788.