learning in a novel way suitable for this challeng-
ing task (on specific examples), and Mask-RCNN,
despite employing mainly sorely simplistic synthetic
images (alongside a few real ones which have back-
ground similarities with the test set) as the training
set.
Since our results point toward hand segmentation
on a limited number of test sets, we end the introduc-
tion by elaborating more on the general advantages
of repetitive training in resolving premature satura-
tion, based on a short comparative study (Fig.2). To
conduct a systematic comparison, we perform both
of these training schemes with the same training set
(e.g., synthetic data without any real data), the same
number of epochs (e.g., one), and equal values for the
learning rates (e.g., 0.001). The number of overall
steps also is equal (e.g. 1000). In the case of con-
ventional training (e.g., the orange bar at the far left
side of the Fig.2-upper middle), we carry out these
1000 steps in one session. However, in the case of
repetitive training, each repetition has 100 steps (e.g.,
10 × 100 = 1000). The major difference the two,
here, is the number of layers under training. For
conventional training, we let all layers, whereas, for
the repetitive training, we consider alternative layers
of all,all,all,3+,3+,all,all,4+,5+,all learn from our
training set. Finding the optimal frozen layers for
each repetition can be an iterative process.
As seen in Fig.2-upper right, the repetitive train-
ing improves the average loss continuously (expect on
the 4
th
) with the best loss of 0.4113 on the 8
th
repe-
tition (which is better than the conventional loss of
0.4409). Repetitions 9 and 10 did not improve the
average loss. Therefore, we would consider the 8
th
net as the final network. Besides, the Fig.2-bottom
shows the comparison between the batch losses of
all repetitions (with concatenation) and that of the
conventional training. The green curve demonstrates
our repetitive training’s advantages at the end of each
epoch. We illustrate the loss of a selected repetitions
in Fig.2-upper right. One can argue that saturation
begins after step 80. Compared to 100 total steps, the
saturation is not premature anymore, illustrating each
net learns from much of the data, and providing a po-
tential explanation for the advantages of our repetitive
training.
Though the prime inspiration of the method is to
repeatedly train the resulting networks on the next
subset. In Section 4 we demonstrate utilizing an iden-
tical subset for retraining the next net, in some cases,
would also lead to satisfactory results. Testing the
broader applicability of the repetitive training, as a
more general training strategy, requires an extensive
study of different CNN models on various application
domains and objects. Therefore, here, we provide ev-
idence for the feasibility of the approach for a specific
problem, particular object, and a certain network.
2 LITERATURE REVIEW
Currently, there exist many successful object segmen-
tation frameworks, and they fall into two main cate-
gories: semantic, and instance segmentation. The ear-
lier one is a class-level segmentation (e.g., DeepLab
(Chen et al., 2018)), whereas, the latter one, on the
other hand, identifies each object on instance-level for
all trained classes.
Belonging to the family of region-based convo-
lutional neural networks or RCNN (Girshick et al.,
2012), the Mask-RCNN demonstrated one of the most
successful performances on instance segmentation.
The current default framework segments 99 objects
spanning a wide range of categories (e.g., from liv-
ing creatures to electronic devices). It supplies a sim-
ple framework that is straightforward to extend to
segment a new set of objects. Besides, it possesses
other significant properties that make it an appropri-
ate choice of network for this study.
First, it provides us with a wide range of possibil-
ities in setting parameters, from as simple as epoch
numbers to as sophisticated as freezing some layers
during the training. That permits a more thorough
investigation of proposed approaches. Second, be-
sides employing standard convolutional neural net-
works (e.g., V GG, Resnet50, or Resnet101 (He et al.,
2016)), MRCNN allows us to employ a self-trained
net as the backbone to transfer learning. That is es-
sential to study the properties of repetitive training.
Transfer learning (Pan and Yang, 2009), enables
machine learning frameworks to transfer knowledge
by storing the information (e.g., weights in the realm
of convolutional neural networks, CNN), gained from
one field of a problem (e.g., person segmentation)
and applying it to a related but different domain (e.g.,
hand segmentation). Our strategy of repetitive learn-
ing also follows the approach of transfer learning, but
in a different fashion. More specifically, the knowl-
edge is not transferred to a different domain, but to
the same domain (with different training-set and dif-
ferent parameters) over and over again. Beside inves-
tigating this training strategy, an informative study in
which how a segmentation framework, in general, and
the Mask-RCNN, in specific, would perform when
trained on synthetic data, is missing in the literature.
VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications
222