countered and conventionally used screw types found
in this domain, we have consulted experts from the
disassembly plant in cooperation with our university
and agreed on 12 types of screw heads such as dif-
ferent sizes of Torx, Philips, Slotted and Allen heads.
We assessed various electronic devices, which can be
found in huge numbers nowadays in E-Waste, such as
computer hard drives, DVD players, gaming consoles
and many more. As anticipated, we concluded that
almost all screws in this domain are circular, which is
the natural geometry of these objects and represents
the central feature to be utilised to detect a screw ob-
ject. Fig. 3 illustrates samples of screw types/sizes
classified in our dataset. It must be underlined that
there are also non-circular screws manufactured, how-
ever, those are few and we found no such screws in the
devices of interest in the disassembly plant we coop-
erate with. We therefore based our method on first
finding circular structures in the images. Clearly, not
every circular structure is a screw, for example stick-
ers, holes, transistors, etc. exist, which are also circu-
lar, but not screws. Nevertheless, circular structures
provide us with priors for screws and the first step of
our method is to collect those screw-candidates.
Figure 3: Screw types encountered during the disassembly
of various electronic devices found in E-Waste. Last row
depicts artefacts that count as another type in classification.
We use the base candidate generator from our pre-
vious work (Yildiz and W
¨
org
¨
otter, 2019), in order
to collect candidates. We run our program in offline
mode and rely on the Hough Transform for candidate
detection. This is a standard computer vision method
for circle detection (Duda and Hart, 1971) and shall
not be explained here. Differing from the standard
Hough Transform here we use a version, relying on
the so called Hough Gradient (of the OpenCV library
(Bradski and Kaehler, 2008)). This version uses the
gradient information of the edges that form the cir-
cle. We refer the reader to the handbook published
by the creators of the aforementioned library for fur-
ther implementation details on the algorithm of the
Hough Gradient. It should be noted that after col-
lecting our candidates, we then switch back to RGB
from grayscale, since our classifiers operate far better
in colored images than in grayscaled ones.
3.3 Training the Classifiers
As mentioned before, the user manually separates
screw types and artifacts by which a classifier can be
trained using these positive and negative examples as
training data. In Fig. 3 types of screw heads and ar-
tifacts taken from various devices found in E-Waste
are shown. In general, these screws are found in other
device-classes and, thus, the resulting training set can
be transferred also to other devices. In that case, how-
ever, one has to increase the number of samples to
account for more types of screws.
We have investigated state-of-the-art classifiers
found in the literature and we picked the three top-
performing ones for comparison at the end. These
networks, to our experience, were performing toler-
ably good given a not so large dataset for a specific
device-class (hard drives of any size). Finally, we de-
cided to evaluate EfficientNets (Tan and Le, 2019),
ResNets (He et al., 2016), DenseNets (Huang et al.,
2017), scoring top accuracies on the ImageNet (Deng
et al., 2009). Additionally, EfficientNets have been
used in the latest works (Xie et al., 2019) in pur-
suit of improving ImageNet classification, by using a
new self-traning method called Noisy Student Train-
ing. Inspired by this effort, we chose EfficientNetB2.
Our strategy to evaluate the networks is described
as follows. We go by the standard procedure for trans-
fer learning: cutting the pre-trained model on the last
convolutional layer and adding a new sequence of lin-
ear layers called the head. We use this head architec-
ture for all models we explore. In the first 10 epochs,
we train only the added final layers of the model by
freezing all convolutional layers, not allowing any up-
dates to their weights. Afterward, we unfreeze all lay-
ers and train the entire network. We find it useful to
use differential learning rates at this stage. It is not de-
sired to change the early layers of the models as much
as the later ones, therefore lower learning rates are
used in the first layers and higher ones in the end. Us-
ing the Adam optimizer (Kingma and Ba, 2014) with
the learning rate of 1 ×10
−5
. Figure 4 illustrates the
model architecture we use. Here, the term ”Block” is
a higher abstraction used for group of layers.
To further reduce overfitting and to come up with
a model that can generalize, we applied an additional
data augmentation step. There are several data aug-
mentation operations we applied to introduce more
variety in the data such as rotation, brightness and
contrast.
ROBOVIS 2020 - International Conference on Robotics, Computer Vision and Intelligent Systems
64