Authors:
Floris De Feyter
and
Toon Goedemé
Affiliation:
EAVISE—PSI—ESAT, KU Leuven, Sint-Katelijne-Waver, Belgium
Keyword(s):
Product Detection, Recognition, Joint Detection, Recognition, Task-Specific Training.
Abstract:
Training a single model jointly for detection and recognition is typically done with a dataset that is fully annotated, i.e., the annotations consist of boxes with class labels. In the case of retail product detection and recognition, however, developing such a dataset is very expensive due to the large variety of products. It would be much more cost-efficient and scalable if we could employ two task-specific datasets: one detection-only and one recognition-only dataset. Unfortunately, experiments indicate a significant drop in performance when trained on task-specific data. Due to the potential cost savings, we are convinced that more research should be done on this matter and, therefore, we propose a set of training procedures that allows us to carefully investigate the differences between training with fully-annotated vs. task-specific data. We demonstrate this on a product detection and recognition dataset and as such reveal one of the core issues that is inherent to task-specifi
c training. We hope that our results will motivate and inspire researchers to further look into the problem of employing task-specific datasets to train joint detection and recognition models.
(More)