Table 5: Parameters range fixed in order to compare the
two models MNOD and MNOS when trained with the
VOC2011 dataset. For each layer L the maxmum number
of node was fixed to N. For each node one of the fixed W
s
,
I
s
sizes and eventually a set of leaf nodes were used.
L N W
s
I
s
Leaves
1 4 3x3 50, 90 brightness,rgb
2 4 3x3,5x5 10, 50, 90 brightness,rgb
3 4 3x3,7x7 10, 50, 90 brightness,rgb
4 1 5x5,7x7 10, 50, 90 brightness,rgb
Table 6 summarizes the segmentation accuracies
when the new MNOS model is compared with the
existing MNOD. The first columns list the accuracy
results obtained in eight classes with MNOS and
MNOD model. We can notice an overall increase
comparing the results without the GrabCut post pro-
cessing. In fact, the GrabCut algorithm don’t always
improve the MNOS results. In the last column of table
6 we highlight the only three classes where the post
process actually brings an improvement in accuracy
results. It is possible to conclude the reason why the
GrabCut cannot give a good contribute lies in the fact
that the MNOS masks lack of accuracy, so the region
mask we use to initialize the GrabCut is inaccurate
and then the post processing could lead to worsen the
MNOS mask accuracy, amplifying errors.
On the other hand, when the MNOS mask is good,
the GrabCut actually leads to an improvement in the
final segmentation. Let’s look at the image in figure
7(a), taken from the VOC2011 dataset for the class
“train”. We calculate the MNOS mask, which pro-
duces the segmentation in figure 7(b). It is a fairly
good result, because it is a simple image. It almost
segmented the object, except for some details. So,
we can generate a good initialization map for the post
processing, and the GrabCut is able to perfect the re-
sult, as we can see in 7(c). Obviously, that’s an opti-
mal situation.
5 CONCLUSIONS
In this paper we described an object segmentation al-
gorithm based on a multi network system and inspired
from a previously presented object detection algo-
rithm, the MNOD. It is composed by a set of neural
networks combined together to provide a single out-
put result. The model results highlight the benefits of
our solution. The proposed algorithm can be con-
figured for different classes of objects and its nodes
may be of different types using sliding windows or
segments to read their input. We presented a model
Table 6: Results comparison between the existing MNOS
segmentation accuracy (%) and the new MNOS algorithm
on the VOC2011 dataset. The column Diff resumes the per-
formance gain between the two methods. The last column
GC shows the post processing accuracy results when ap-
plied to the MNOS ouput maps.
Class MNOD MNOS Diff GC
boat 23,51 28,87 +5,36 26,43
dog 23,68 29,23 +5,55 24,20
horse 29,39 35,14 +5,75 32,41
motorbike 45,37 47,13 +1,76 50,02
pottedplant 12,45 14,73 +2,28 21,03
sheep 28,26 30,22 +1,96 31,75
train 38,73 46,80 +8,07 41,37
tvmonitor 16,62 19,52 +2,9 15,86
(a) (b) (c)
Figure 7: (a) Typical image of the VOC2011 dataset with
an object that belongs to the “train” class; (b) Automatic
segmentation of the train using our MNOS model; (c) Re-
finment of the segmented object with GrabCut.
that use the sliding window in the first layers of the
tree, and segments in the subsequent layers. Then, we
studied a post processing phase using the GrabCut al-
gorithm. We fulfilled its interactive initialization by
exploiting the MNOS output segmentation map.
We tested the proposed model on different
datasets composed by images representing commer-
cial products from the web. The MNOS algorithm
was pushed in order to achieve better accuracy results
than the MNOD model. We also obtained good re-
sults on some classes of the VOC2011 dataset. More-
over, the results show that our algorithm is robust to
the change of perspective for the same object and at
the same time, it is robust for objects of the same type
but different shapes in different poses or even articu-
lated and slightly occluded.
The GrabCut post processing phase led to very
good results when the segmentation map is accurate
and clean. Anyway, with very difficult images, like
the ones in the VOC2011 dataset, the MNOS algo-
rithm often produces segmentation masks that aren’t
accurate enough to provide a good initialization for
the GrabCut, so it often worsens the MNOS result.
The most important extension we plan to realize
is to make our model works with multiple classes in-
stead as a single class segmentation algorithm. More-
VISAPP 2012 - International Conference on Computer Vision Theory and Applications
528