IoU improvement of about 3%. Thus, the best results
was achieved when we use the softmax function for
estimating the weight values of each scale.
Qualitative results of some of these experiments
are shown in Figure 5. As shown, and supporting our
quantitative results, the proposed model with AggrF-
CNSoftmax (using aggregation of FC and sofmax lay-
ers) present visual improvements of finger parts seg-
mentation with our dataset, compared to the FCN mo-
del and the two other variations of the proposed model
(AggrFCNRelu and AvrageAggr).
Table 2: The performance of the three variants of the pro-
posed model (AggrFCNSoftmax, AggrFCNRelu and Avra-
geAggr) and the FCN model.
Method IoU Accuracy
FCN (Chen et al., 2014) 0.5833 87.32
AvrageAggr 0.6231 87.64
AggrFCNSoftmax 0.6307 88.13
AggrFCNRelu 0.6151 87.91
A Case Study
To assess the performance of the proposed model
on a concrete case, we select an image randomly
(see Figure 6) from the dataset.Then, we analyze
the performance of the proposed model under diffe-
rent conditions: illumination changing, background
changing, and image flipping. With no effects on
the input image, our model achieved an IoU of
0.5515. Applying illumination effect based on non-
linear Gamma correction with different values (γ ∈
{0.5,1.0,1.5,2.5}) causes a degradation in the per-
formance of our model (IoU drops to 0.5515). This
degradation can be explained by the disappearance of
small parts in Figure 6-(col 1-2). Another issue was
investigated by changing the background and image
flipping. Our experiments show that the changing in
the background IoU reduces to 0.5501 (see Figure 6-
(cols. 3-4)), while image flipping reduces the IoU
to 0.5493 (see Figure 6-(cols. 5-6)). As shown, the
change of the IoU value around 0.55 under different
conditions, such illumination changes, adding back-
ground and image flipping. Consequently, we can say
that the change on the global context of the input ima-
ges has insignificant impact on the final decision of
the proposed model. It is important to note that dif-
ferent finger parts are discriminated using their rela-
tive location to the palm more than their appearance.
Thus, we can conclude that the model learns how to
extract global shape information from the input ima-
ges.
5 CONCLUSIONS
In this paper, we have proposed a novel deep lear-
ning based model for finger parts semantic segmen-
tation. The proposed model is based on generating
features maps with different resolution of an input
image. These features maps are then aggregated to-
gether using automated weights estimated from fully
connected layer. The estimated weights are used to
assign a high weight for the more important scaled
feature maps and suppress others. The generated fe-
ature maps are fed into an encoder-decoder network
with skip-connections to predict the final segmenta-
tion mask. In addition, we have introduced a new da-
taset that can help to solve finger parts semantic seg-
mentation problem. To the best of our knowledge,
FingerParts is first dataset for finger parts semantic
segmentation with real high resolution images. The
proposed model outperformed the standard FCN net-
work with an improvement of 5% in terms of the IoU
metric. Future work will include the use of the seg-
mented fingers parts to improve the accuracy of ge-
sture recognition methods.
REFERENCES
Abdel-Nasser, M. and Mahmoud, K. (2017). Accurate pho-
tovoltaic power forecasting models using deep lstm-
rnn. Neural Computing and Applications, pages 1–14.
Badrinarayanan, V., Kendall, A., and Cipolla, R. (2015).
Segnet: A deep convolutional encoder-decoder ar-
chitecture for image segmentation. arXiv preprint
arXiv:1511.00561.
Barczak, A., Reyes, N., Abastillas, M., Piccio, A., and Su-
snjak, T. (2011). A new 2d static hand gesture colour
image dataset for asl gestures.
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., and
Yuille, A. L. (2014). Semantic image segmentation
with deep convolutional nets and fully connected crfs.
arXiv preprint arXiv:1412.7062.
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., and
Yuille, A. L. (2018). Deeplab: Semantic image seg-
mentation with deep convolutional nets, atrous convo-
lution, and fully connected crfs. IEEE transactions on
pattern analysis and machine intelligence, 40(4):834–
848.
Dong, C., Loy, C. C., He, K., and Tang, X. (2014). Le-
arning a deep convolutional network for image super-
resolution. In European conference on computer vi-
sion, pages 184–199. Springer.
Eigen, D. and Fergus, R. (2015). Predicting depth, surface
normals and semantic labels with a common multi-
scale convolutional architecture. In Proceedings of the
IEEE International Conference on Computer Vision,
pages 2650–2658.
Eigen, D., Krishnan, D., and Fergus, R. (2013). Restoring
an image taken through a window covered with dirt or
VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications
82