Figure 3: Collector S tructural Design.
Figure 3 shows the pro posed robot. It is not differ-
ent from others already deployed in retail. It is based
on a mobile base (2 or 4 wheels with suspension sys-
tem, battery, and voltage regulation), with a structure
on it, similar to a tower, where the c ollectors are ver-
tically and equidistantly placed. The collectors con-
sist of low compu ta tional resources pro cessing units
(Raspberry Pi 4, Jetson Nano, In tel Nuc, among oth-
ers), a RGB-UHD came ra an d a 3 D Camera.
This robot needs to move autonomously, so it
is recommended to use state-of-the-art robotic soft-
ware for moving it, like ROS(Stanford Ar tificial In-
telligence Laboratory et al., ), ROS2(Macenski e t al.,
2022). This software is able to estimate the positional
informa tion of the robot with respect to the store map
at all times.
An important part of the robot is the way it cap-
tures the information. It goes thro ugh the hall of
the store taken images with the cameras of collec-
tors. This route is performed by doing steps. A set
of images (rgb-uhd and depth ima ges per collector) is
taken every step. Every step is limited by the fields of
views (FOVs) of the cameras. This is done beca use
the main purpose is to acquir e information, so redun-
dancy is introduced during the acquisition. For hori-
zontal redundancy in the images, the distance of each
step is shorter than the FOV of the cameras, and for
vertical redundancy , the collectors are equidistantly
positioned in the tower of the robot. Have redundant
informa tion provides the ce rtainty of not losing infor-
mation at all.
4.3 Proposed Pipeline
All processes, including the inputs and outputs, are
explained below. Figure 4 shows the pip eline of
the pr oposed solution for the shelf auditing problem.
Blue and green block s are inp uts (data acquired a s
mentioned before ) and outputs ( reports) respectively.
Gray blocks are processes that may involve the use of
artificial intelligen ce algorithms such as object detec-
tion, object recognition, clustering, text re cognition,
among others. Yellow blocks are also processes but
guided to validations or estimations, that is, they use
the manipulated and filtered data for creating the re-
port outputs.
It most be m entioned th a t the proposed pipeline
does not try to add new hardware or pr ocesses to
the sto res, like in the case of implementing digi-
tal/electronic price tags or RFIDs to the price tags or
products, since this will create new exp enses to the
retail.
In Figure 5 is shown how the RGB and depth
images are processed during the firsts blocks in the
pipeline. Yellow section r efers to product detection
and recogn ition; blue section refers to price tags de-
tection, item s detection a nd recognition; while the
green section refers to gap detection .
Product Detection. This process requires RGB-UHD
images as input and p roduces RBOXs th at represent
the products as outputs. Each RBOX is defined by
a 7-value list containing the information of the d e-
tection (x axis, y axis, height, width, rota tion angle,
confidence and class). A pre-trained object detection
algorithm could be used to carry o ut this process. To
continue with other processes, crops of the products
should be done, and will be ref erred as uhd-product
images.
Product Recognition. This process requires uhd-
product images as input and produces a text referring
to the product class. The output can be represented
by a descrip tion or a code, but it is recomm end to
use codes instead of description, as the dataset will
be lighter (meaning size, as the code is normally a
shorter string), additio nally, the probability of c hang-
ing products’ descriptions is higher than the one of
changin g codes. A pre- trained multi-classification a l-
gorithm would be needed to carry out this pro cess.
Price Tag Detection. This process requires RGB-
UHD images as input and produ c e RBOXs that re p-
resent the price tags as outp uts. A pre-trained ob-
ject detec tion algorithm for the ta sk of detecting price
tags could be used to c a rry out this process A cu sto m
dataset must be created for this process, since to the
best of our knowledge there is no public dataset cre-
ated for detecting price tags. To continue with other
processes, crops of the price tags should be done, and
will referre d as uhd-price-tag images.
Note that, there is the option of creating a single
detector for price tags and products, but this pipeline
shows them separated ju st to make clear each process
of the pipeline.
Items Detection. This process requires as input uhd-
price-tag images and prod uce RBOXs that represent
the items of the price tags. Items is the denomination
in this article to the important texts in the price tag s
such as, but not limited to, codes, prices and descrip-
tions. A pre-trained object detection algorithm for the