has considered stationary objects in a relatively sim-
ple and well-lit environment. An obvious improve-
ment likely to appear in industrial settings is to al-
low motion between the camera and the objects, e.g.
due to conveyor belts. To do so, further research in
the detection and segmentation of moving objects is
necessary before presentation to the classifier. Pos-
sible solutions to this, depending on the scene com-
plexity, range from a traditional Mean Frame Sub-
traction (MFS) method to detect moving objects in
simple setups where the background remains static
for a long time (Tamersoy, 2009) to more elabo-
rated trained approaches such as RetinaNet (Lin et al.,
2020) or YOLOv4 (Bochkovskiy et al., 2020) object
detectors. The latter is more tolerant to changes in
scale, light, multiple objects, and motion, but often
they need more training data. This, however, could be
addressed with an approach based on synthetic data
like the one followed in this paper.
In a warehouse, new products are coming in all
the time. In our case, the classifier must be retrained
to recognize each new class. Other alternatives for
warehouses with many different products would be
expanding a classifier without retraining it (Schulz
et al., 2020). Using labels attached to products would
be another approach to identify objects. For example,
(Nemati et al., 2016) employs spiral codes, similar
in concept to barcodes, but detectable with any 360-
degree orientation (in contraposition to barcodes that
need to be properly oriented). However, this would
demand manual attachment of labels to the objects.
ACKNOWLEDGEMENTS
This work has been carried out by August Baaz and
Yonan Yonan in the context of their Bachelor Thesis
at Halmstad University (Computer Science and En-
gineering), with the support of HMS Networks AB
in Halmstad. Authors Hernandez-Diaz and Alonso-
Fernandez thank the Swedish Research Council (VR)
and the Swedish Innovation Agency (VINNOVA) for
funding their research.
REFERENCES
Al-Faraj, S. et al. (2021). Cnn-based alphabet identification
and sorting robotic arm. In ICCCES.
Bochkovskiy, A. et al. (2020). Yolov4: Optimal speed and
accuracy of object detection. CoRR, abs/2004.10934.
Borkman, S. et al. (2021). Unity perception: Generate syn-
thetic data for comp vis. CoRR, abs/2107.04259.
Femling, F., Olsson, A., Alonso-Fernandez, F. (2018). Fruit
and vegetable identification using machine learning
for retail applications. In SITIS.
Gyawali, D. et al. (2020). Comparative analysis of multiple
deep CNN models for waste classification. In ICAEIC.
Hachem, C. et al. (2021). Automation of quality control in
automotive with deep learning algorithms. In ICCCR.
He, K. et al. (2016). Deep residual learning for image
recognition. In CVPR.
HMS (2022). https://www.hms-networks.com.
Jung, H. et al. (2017). Resnet-based vehicle classif and lo-
calization in traffic surveillance systems. In CVPRW.
Karras, T. et al. (2021). A style-based generator architecture
for generative adversarial networks. IEEE TPAMI.
Lin, T. et al. (2020). Focal loss for dense object detection.
IEEE TPAMI.
Liu, Y. et al. (2018). Scene classification based on multi-
scale convolutional neural network. IEEE TPAMI.
Nemati, H. M., Fan, Y., Alonso-Fernandez, F. (2016). Hand
detection and gesture recognition using symmetric
patterns. In ACIIDS.
Nilsson, F., Jakobsen, J., Alonso-Fernandez, F. (2020). De-
tection and classification of industrial signal lights for
factory floors. In ISCV.
Persson, A., Dymne, N., Alonso-Fernandez, F. (2021).
Classification of ps and abs black plastics for weee
recycling applications. In ISCMI.
Qiu, W., Yuille, A. (2016). Unrealcv: Connecting computer
vision to unreal engine. In ECCVW.
Reddy, A. S. B., Juliet, D. S. (2019). Transfer learning with
resnet-50 for malaria cell-image classif. In ICCSP.
Richter, S. R. et al. (2016). Playing for data: Ground truth
from computer games. In ECCV.
Schulz, J. et al. (2020). Extending deep learning to new
classes without retraining. In SPIE DSMEOOT XXV.
Svanstr
¨
om, F., Englund, C., Alonso-Fernandez, F. (2021).
Real-time drone detection and tracking with visible,
thermal and acoustic sensors. In ICPR.
Tamersoy, B. (2009). Background subtraction. The Univer-
sity of Texas at Austin.
Tremblay, J. et al. (2018). Training deep networks with syn-
thetic data: Bridging the reality gap by domain ran-
domization. In CVPRW.
Wang, Y. et al. (2020). A cnn-based visual sorting system
with cloud-edge computing for flexible manufacturing
systems. IEEE TII.
Ward, C. M. et al. (2018). Ship classification from overhead
imagery using synthetic data and domain adaptation.
In IEEE OCEANS.
Xu, X. et al. (2021). Industry 4.0 and 5.0—inception, con-
ception and perception. Journal Manufacturing Sys.
Yosinski, J. et al. (2015). Understanding neural networks
through deep visualization. In ICMLW.
Zhu, J.-Y. et al. (2017). Unpaired image-to-image transla-
tion using cycle-consistent adversarial net. In ICCV.
ICPRAM 2023 - 12th International Conference on Pattern Recognition Applications and Methods
394