Figure 2: Floating island of debris.
Figure 3: Floating island of debris, mask image.
waste items and lowering the threshold would have
generated even more false alarms. As plastic waste is
often contaminated by e.g. dirt and comes in differ-
ent colors and shapes therefore we needed an image
recognition algorithm able to operate in such a noisy
environment. Deep neural networks (DNN) were ex-
pected to satisfy these requirements.
Applying DNN to recognize floating plastic waste
is not a new idea, (van Lieshout et al., 2020) also took
this approach. The camera setup and the classfication
requirements are different, however. Their system
uses a downward-looking camera which decreases the
distance to the target objects. This setup also results
in better resolution which lets them perform more de-
tailed classification (”plastic”/”not plastic”).
Our second iteration is still expected to cover as
large water surface as possible which means that tar-
get objects measuring 20-30 cm can be as far as 20-30
meters from the camera. Even if the distance can be
partially offset by optical zoom, targets will still look
small in the input image. For this reason we did not
expect any classification of the floating waste.
We experimented with the YOLOv3 (Redmon and
Farhadi, 2018) and Faster R-CNN (Ren et al., 2015)
deep neural networks. We could not achieve reliable
object detection with YOLOv3, imprecise localisa-
tion was experienced. This is in line with YOLOv3
authors’ own paper which notes that YOLOv3 strug-
gles to get the boxes perfectly aligned with the object.
In case of Faster R-CNN the challenge was that
our training machine had only 6GB of GPU memory
which is not sufficient to train with ResNet-101 back-
bone commonly used with Faster R-CNN, so we fell
back to ResNet-50-FPN model. The implementation
came from Torchvision, the model was pre-trained
on COCO train2017 dataset, the last 3 layers of the
backbone was allowed to train and we trained for 11
epochs. The threshold confidence level was set to a
relatively low value, 0.25. We expect that this low
level will generate some false positives but the low
quality of the target objects (due to the quite long dis-
tance) requires relatively lax recognition. The initial
training dataset had 195 images, was annotated with
the VGG Image Annotator tool
1
to determine the
bounding area of the relevant object and came from
the following sources.
• Live images of plastic waste floating on the river.
Some of the images were collected by camera
crews while others came from our own camera
from the first iteration. We did not have enough
high-quality images of this kind in our disposi-
tion therefore we needed to add additional images
from diverse sources.
• Plastic waste collected from the river and photoed
in front of neutral backgrounds. Figure 4 shows
such a training image with bounding area annota-
tion shown.
• Plastic waste images in natural settings (e.g.
seashores) collected from the internet.
The system’s architecture is constructed in such
a way that its user is expected to continuously collect
and annotate images, hence we expect the training im-
age set to grow.
The training image set was augmented by rotat-
ing the training images by 90, 180 and 270 degrees.
All the images were scaled so that their longer side
was 640 pixel. There is no scaling augmentation as
the model’s Feature Pyramid Network takes care of
scaling the training data during inference.
The result is demonstrated in Figure 5. The algo-
rithm cannot recognize every waste item but it recog-
nizes enough many so that the warning can be trig-
gered.
Further analysis was done on 5 video footages
(Table 1) taken in different circumstances about real
large scale plastic waste pollution. Each footage is
filmed in a river landscape environment and depicts
floating debris from larger distance (5-50 meters),
plastic and non-plastic at the same time. By viewing
the selected section of the video footage, we counted
the recognizable debris and compared it to the output
of the algorithm. The following categories were con-
sidered.
1
https://www.robots.ox.ac.uk/˜vgg/software/via/
Towards a Floating Plastic Waste Early Warning System
47