ripe, not yet ripe, and overripe, and (3) the localiza-
tion of the ripe and the foul strawberries (both have
to be harvested and will be put into different boxes).
Optionally, a detection of leaves could be a task for
the robot in case leaves have to be removed or pushed
aside by the robot. And finally, one additional de-
mand might be that the system needs to be run ei-
ther at night (compare (Hayashi et al., 2010)) with
controlled but fully artificial lighting or at day time –
with brighter environmental lighting but varying with
dawn, dusk, and weather conditions.
For the solution of several classification tasks in
agriculture and other applications a combination of
visual images (sometimes with a separate “red-edge”
channel) and near-infrared (NIR) images are used –
sometimes extended by UV and/or short-wave in-
frared (SWIR) images (Tiedemann et al., 2021). In a
first phase, sample data of ripe, not ripe, damaged and
overripe strawberries is collected in 17 spectra from
250 nm to 1,550 nm. Based on the first data collec-
tion a subset of the 17 spectra is selected for the final
prototype and is evaluated in a second phase. The up
to 17 2D-images are used together for the detection
and classification task. If a strawberry is detected and
needs to be harvested, its position relative to the robot
is determined using a further sensor. For this task a
time-of-flight (TOF-) camera, a stereo camera, or a li-
dar will be used. The selection of the sensor to be used
in this task is done based on tests in the first project
phase.
The multi-spectral imaging (MSI) data sets are
collected with three different cameras (UV camera,
visual camera, SWIR camera) and with different spec-
tral filters. Figure 4 shows an exemplary overview of
one MSI data set.
To analyze and to visualize the relation between
single spectral components and the classification of
ripeness, false color images can be used. In Figure 5
an example is depicted with the spectral images of
845 nm, 1,450 nm, and with an image taken by the
SWIR camera with no filter as components red, green,
and blue, respectively. At the bottom right strawberry
a defective/foul area can be recognized quite easily.
The actual advantages of MSI can be utilized fully
when all available spectral images (dimensions) are
used. However, this data is hard to study and to
understand for humans. E.g., the false color image
uses only 3 of the 17 dimensions. However, to clas-
sify ripe from non-ripe from overripe strawberries and
from other parts of the plants, machine learning (ML)
based methods will be applied and studied. These
methods use all 17 dimensions and are supposed to
be able to classify correctly between the classes men-
tioned above. First tests will be carried out with sup-
port vector machines (SVM) which were dominating
classification tasks until deep methods as the convo-
lutional neural networks (CNN) showed better perfor-
mance in several applications. However, these were
high-dimensional classification task like image clas-
sification with hundreds or thousands dimensions and
with local relations between input dimensions. It is
expected that such properties need not to be used in
the classification task in this project, thus, no object
classification / no classification of a whole strawberry.
Rather, a classification on single pixel basis is planned
as a first step, using only the 17 gray values of the dif-
ferent spectra. In a second step pixel classifications
can be grouped by size and class leading to an ordered
list of strawberries of different classes and sizes.
To get a first impression of the task the classifier
has to solve, again a visualization with only up to
four dimensions is helpful. Besides more elaborated
and complex methods as principle component analy-
sis (PCA) or t-SNE visualization, a simple projection
from the 17-dimensional to a three-dimensional space
can give interesting insights. Figure 6 shows a projec-
tion on the dimensions 324 nm, 740 nm, and 1,550 nm
with the color as fourth dimension encoding the class
of the pixel as labeled ground truth. There, the back-
ground (blue) can be separated clearly from the rest,
stem and leaves (yellow and green) are harder to sep-
arate (but could be possible), the brown non-ripe area
and the black (defective part) area are close to the red
(healthy and ripe) area but seems to be separable.
These first results of the data analysis give a good
reason to start with simple classification methods as
SVM or a simple multi-layer perceptron (MLP). Fur-
thermore, clustering, followed by a feature selection
and dimensionality reduction study will be next steps.
As a preparation for the next following step, a
first data collection in the field has been carried out.
Figure 7 shows the camera setup placed between the
ridges (dams). Four cameras have been used there
to collect multi-spectral data: (1) a UV camera, (2)
a SWIR camera, (3) a visual camera without an in-
ternal IR filter, and (4) a standard visual camera with
IR filter. The first three cameras were consecutively
equipped with different filter configurations to take
separate images per spectral band. The fourth cam-
era was used to take a visual reference image.
Next steps in this early project state are the pre-
processing of the collected data, a manual analysis of
the data and running first classification tests.
Challenges of Autonomous In-field Fruit Harvesting and Concept of a Robotic Solution
513