
REFERENCES
Arduengo, M., Torras, C., and Sentis, L. (2021). Robust and
adaptive door operation with a mobile robot. Intelli-
gent Service Robotics, 14(3):409–425.
Cheng, T., Song, L., Ge, Y., Liu, W., Wang, X., and Shan,
Y. (2024). Yolo-world: Real-time open-vocabulary
object detection. In 2024 IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR),
pages 16901–16911. IEEE.
Firoozi, R., Tucker, J., Tian, S., Majumdar, A., Sun, J.,
Liu, W., Zhu, Y., Song, S., Kapoor, A., Hausman, K.,
Ichter, B., Driess, D., Wu, J., Lu, C., and Schwager,
M. (2024). Foundation models in robotics: Applica-
tions, challenges, and the future. The International
Journal of Robotics Research.
Gallagher, J. (2023). Distill large vision mod-
els into smaller, efficient models with autodis-
till. Retrieved 21.10.2024 from Roboflow Blog:
https://blog.roboflow.com/autodistill/.
Gao, M., Xing, C., Niebles, J. C., Li, J., Xu, R., Liu, W.,
and Xiong, C. (2022). Open vocabulary object detec-
tion with pseudo bounding-box labels. In Avidan, S.,
Brostow, G., Ciss
´
e, M., Farinella, G. M., and Hassner,
T., editors, Computer Vision – ECCV 2022, volume
13670 of Lecture Notes in Computer Science, pages
266–282. Springer Nature Switzerland, Cham.
Han, K., Rebuffi, S.-A., Ehrhardt, S., Vedaldi, A., and Zis-
serman, A. (2022). Autonovel: Automatically dis-
covering and learning novel visual categories. IEEE
Transactions on Pattern Analysis and Machine Intel-
ligence, 44(10):6767–6781.
Han, K., Wang, Y., Chen, H., Chen, X., Guo, J., Liu, Z.,
Tang, Y., Xiao, Xu, C., Xu, Y., Yang, Z., Zhang, Y.,
and Tao, D. (2023). A survey on vision transformer.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 45(1):87–110.
Hofmann, C., Taliercio, F., Walter, J., Franke, J., and Re-
itelsh
¨
ofer, S. (2023). Towards adaptive environment
perception and understanding for autonomous mobile
robots. In 2023 IEEE Symposium Sensor Data Fusion
and International Conference on Multisensor Fusion
and Integration (SDF-MFI), pages 1–8. IEEE.
IDEA Research (2023). Groundingdino. Retrieved
21.10.2024 from https://github.com/IDEA-
Research/GroundingDINO.
Jocher, G., Chaurasia, A., and Qiu, J. (2023). Ul-
tralytics yolov8. Retrieved 21.10.2024 from
https://github.com/ultralytics/ultralytics.
Joseph, K. J., Khan, S., Khan, F. S., and Balasubramanian,
V. N. (2021). Towards open world object detection.
In 2021 IEEE/CVF Conference on Computer Vision
and Pattern Recognition (CVPR), pages 5826–5836.
IEEE.
Kaddour, J., Harris, J., Mozes, M., Bradley, H., Raileanu,
R., and McHardy, R. (2023). Challenges and applica-
tions of large language models. Retrieved 13.10.2024
from http://arxiv.org/pdf/2307.10169v1.
Kim, J., Ku, Y., Kim, J., Cha, J., and Baek, S. (2024). Vlm-
pl: Advanced pseudo labeling approach for class in-
cremental object detection via vision-language model.
In 2024 IEEE/CVF Conference on Computer Vision
and Pattern Recognition Workshops (CVPRW), pages
4170–4181. IEEE.
Lee, D.-H. et al. (2013). Pseudo-label: The simple and effi-
cient semi-supervised learning method for deep neural
networks. In Workshop on challenges in representa-
tion learning, ICML, volume 3, page 896.
Lesort, T., Lomonaco, V., Stoian, A., Maltoni, D., Fil-
liat, D., and D
´
ıaz-Rodr
´
ıguez, N. (2020). Continual
learning for robotics: Definition, framework, learning
strategies, opportunities and challenges. Information
Fusion, 58:52–68.
Liu, S., Zeng, Z., Ren, T., Li, F., Zhang, H., Yang,
J., Jiang, Q., Li, C., Yang, J., Su, H., Zhu, J.,
and Zhang, L. (2023). Grounding dino: Mar-
rying dino with grounded pre-training for open-
set object detection. Retrieved 13.10.2024 from
http://arxiv.org/pdf/2303.05499v5.
OpenAI (2024). Gpt 4o. Retrieved 21.10.2024 from
https://openai.com/index/hello-gpt-4o/.
Sun, T., Lu, C., and Ling, H. (2022). Prior knowledge
guided unsupervised domain adaptation. In Avidan,
S., Brostow, G., Ciss
´
e, M., Farinella, G. M., and Has-
sner, T., editors, Computer Vision – ECCV 2022, vol-
ume 13693 of Lecture Notes in Computer Science,
pages 639–655. Springer Nature Switzerland, Cham.
Vaze, S., Hant, K., Vedaldi, A., and Zisserman, A. (2022).
Generalized category discovery. In 2022 IEEE/CVF
Conference on Computer Vision and Pattern Recogni-
tion (CVPR), pages 7482–7491. IEEE.
Vemprala, S. H., Bonatti, R., Bucker, A., and Kapoor, A.
(2024). Chatgpt for robotics: Design principles and
model abilities. IEEE Access, 12:55682–55696.
Wang, Z., Li, Y., Chen, X., Lim, S.-N., Torralba, A.,
Zhao, H., and Wang, S. (2023). Detecting everything
in the open world: Towards universal object detec-
tion. In 2023 IEEE/CVF Conference on Computer Vi-
sion and Pattern Recognition (CVPR), pages 11433–
11443. IEEE.
Wu, J., Li, X., Xu, S., Yuan, H., Ding, H., Yang, Y., Li,
X., Zhang, J., Tong, Y., Jiang, X., Ghanem, B., and
Tao, D. (2024). Towards open vocabulary learning:
A survey. IEEE transactions on pattern analysis and
machine intelligence, 46(7):5092–5113.
Yu, R., Liu, S., and Wang, X. (2024). Dataset distillation: A
comprehensive review. IEEE transactions on pattern
analysis and machine intelligence, 46(1):150–170.
Zhao, Z.-Q., Zheng, P., Xu, S.-T., and Wu, X. (2019). Ob-
ject detection with deep learning: A review. IEEE
transactions on neural networks and learning systems,
30(11):3212–3232.
Zhu, C. and Chen, L. (2024). A survey on open-vocabulary
detection and segmentation: Past, present, and future.
IEEE transactions on pattern analysis and machine
intelligence, PP.
Zimmerman, N., Guadagnino, T., Chen, X., Behley, J., and
Stachniss, C. (2023). Long-term localization using se-
mantic cues in floor plan maps. IEEE Robotics and
Automation Letters, 8(1):176–183.
VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications
862