Authors:
Christian Hofmann
;
Christopher May
;
Patrick Ziegler
;
Iliya Ghotbiravandi
;
Jörg Franke
and
Sebastian Reitelshöfer
Affiliation:
Institute for Factory Automation and Production Systems, Friedrich-Alexander-Universität Erlangen-Nürnberg, Egerlandstraße 7, 91058 Erlangen, Germany
Keyword(s):
Open Vocabulary Object Detection, Pseudo-Labeling, Large Language Model, Vision Language Model.
Abstract:
Large Language Models (LLMs) and Vision Language Models (VLMs) enable robots to perform complex tasks. However, many of today’s mobile robots cannot carry the computing hardware required to run these models on board. Furthermore, access via communication systems to external computers running these models is often impractical. Therefore, lightweight object detection models are often utilized to enable mobile robots to semantically perceive their environment. In addition, mobile robots are used in different environments, which also change regularly. Thus, an automated adaptation of object detectors would simplify the deployment of mobile robots. In this paper, we present a method for automated environment-specific individualization and adaptation of lightweight object detectors using LLMs and VLMs, which includes the automated identification of relevant object classes. We comprehensively evaluate our method and show its successful application in principle, while also pointing out short
comings regarding semantic ambiguities and the application of VLMs for pseudo-labeling datasets with bounding box annotations.
(More)