HAND IMAGE SEGMENTATION BY MEANS OF GAUSSIAN

MULTISCALE AGGREGATION FOR BIOMETRIC APPLICATIONS

Alberto de Santos Sierra, Carmen S´anchez

Avila

Group of Biometrics, Biosignals and Security, GB2S

Centro de Dom´otica Integral, Universidad Polit´ecnica de Madrid, Madrid, Spain

Javier Guerra Casanova, Gonzalo Bailador del Pozo

Group of Biometrics, Biosignals and Security, GB2S

Centro de Dom´otica Integral, Universidad Polit´ecnica de Madrid, Madrid, Spain

Keywords:

Hand segmentation, Fuzzy multiscale aggregation, Biometrics, Hand geometry, Synthetic database.

Abstract:

Applying biometrics to daily scenarios involves demanding requirements in terms of software and hardware.

On the contrary, current biometric techniques are also being adapted to present-day devices, like mobile

phones, laptops and the like, which are far from meeting the previous stated requirements. In fact, achiev-

ing a combination of both necessities is one of the most difﬁcult problems at present in biometrics. Therefore,

this paper presents a segmentation algorithm able to provide suitable solutions in terms of precision for hand

biometric recognition, considering a wide range of backgrounds like carpets, glass, grass, mud, pavement,

plastic, tiles or wood. Results highlight that segmentation accuracy is carried out with high rates of preci-

sion (F-measure≥ 88%)), presenting competitive time results when compared to state-of-the-art segmentation

algorithms time performance.

1 INTRODUCTION

Biometrics based on hand recognition have received

an increasing attention in latter years due to their huge

applicability in daily scenarios and the relation be-

tween user acceptance and identiﬁcation/veriﬁcation

rates (Kukula and Elliott, 2005; Kukula and Elliott,

2006).

In addition, hand biometrics system requirements

are easily met with a standard camera and hardware

processor, so that these systems can be easily adapted

to devices like PC, mobile phones and the like.

This paper presents a segmentation method for

isolating hand from background in real environments

oriented to mobile devices. However, biometrics sys-

tems in general strongly rely on the environment con-

ditions such as illumination, background, proximity

to sensor and so forth, and therefore, applying biomet-

rics to mobile devices requires to solve more demand-

ing problems in terms of segmentation and invariance

to changes in feature extraction with less resources.

The proposed method is based on Gaussian

Multiscale Aggregation (GMA) (Garc´ıa-Casarrubios

Mu˜noz et al., 2010), gathering those pixels with sim-

ilar characteristics under a same cluster, providing a

hierarchichal structure along scales to ﬁnd a proper

segmentation where segments correspond to objects

within image.

Segmentation involves a database to properly

evaluate to what extent the method can isolate ob-

jects within an image. In order to assess the proposed

method with real scenarios, a synthetic database has

been collected with a total of 408000 images, con-

taining samples with different backgrounds like soil,

skins/fur, carpets, walls, grass and the like, corre-

sponding to those possible scenarios where hand im-

ages can be taken.

The layout of this paper remains as follows: First

of all, a literature review is presented under Section 2.

The proposed approach is described in Section 3, to-

gether with the database used to test and evaluate the

algorithm (Section 4) and the evaluation criteria (Sec-

tion 5). Finally, results in Section 6 and conclusions

in Section 7 will end this paper.

de Santos Sierra A., Sánchez Ávila C., Guerra Casanova J. and Bailador del Pozo G..

HAND IMAGE SEGMENTATION BY MEANS OF GAUSSIAN MULTISCALE AGGREGATION FOR BIOMETRIC APPLICATIONS.

DOI: 10.5220/0003462500400046

In Proceedings of the International Conference on Signal Processing and Multimedia Applications (SIGMAP-2011), pages 40-46

ISBN: 978-989-8425-72-0

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

2 LITERATURE REVIEW

Segmentation problem has been cope with from dif-

ferent mathematical approaches with a wide num-

ber of different applications (Kang et al., 2009; Shi-

rakawa and Nagao, 2009).

Concretely, one approach which has experienced

a great development in recent years is based on mul-

tiscale aggregation (Sharon et al., 2006). This pro-

cedure is based on processing the image according

to a set of mathematical operations, so that pixels

with similar properties are gathered in a same seg-

ment. The main characteristic of this approach re-

lies on repeating this procedure through subsequent

scales, in which the number of pixels is reduced for

each scale, due to former aggregation (Sharon et al.,

2006). Moreover, recent results obtained by these al-

gorithms have shown improvements when compared

to other methods (Alpert et al., 2007), like the Nor-

malized Cuts (Shi and Malik, 2000) and Mean-Shift

method (Comaniciu et al., 2002).

Multiscale aggregation methods gather a wide

range of algorithms involving different mathematical

operations applied to pixels in image.

In fact, several approaches have been proposed

based on Segmentation by Weighted Aggregation

(SWA (Sharon et al., 2006)), providing accurate re-

sults by means of similarities between intensities

of neighboring pixels (Sharon et al., 2000), mea-

surements of texture differences and boundary in-

tegrity (Sharon et al., 2001), more complicated op-

erations, such as Gradient Orientations Histograms

(GOH (Rory Tait Neilson and McDonald, 2007)),

or more straightforward grouping methods based on

the intensity contrast between two segments bound-

ary and each segment inner (Felzenszwalb and Hut-

tenlocher, 2004).

3 GAUSSIAN MULTISCALE

AGGREGATION

Let deﬁne an image I as a graph G = (V,E,W) were

V represents nodes in graph corresponding to pixels

in image, E stands for the edges connecting pairs of

previous nodesV and W the weight of previous edges,

measuring the similarity between two nodes (pixels)

in V.

The idea is to divide graph G into two subgraphs

G = G

∪G

, so that subgraph G

contains pixels cor-

responding to hand and the subgraph G

gathers pix-

els corresponding to background. In addition, nodes

in V contains two parameters: intensity (represented

by µ) and deviation (represented by σ). This in-

tensity corresponds in the ﬁrst scale to the intensity

in terms of grayscale images (Gonzalez and Woods,

1992), and to average intensities in subsequent scales.

However, despite of existing deviation intensity value

in subsequent scales, this parameter lacks of sense

within ﬁrst scale. The deviation in ﬁrst scale will

be set based on their neighbourhood of each pixel,

which is a 4-neighbourhood structure for the ﬁrst

scale. These two parameters are gathered into a sin-

gle function φ

[s]

(µ

[s]

,σ

[s]

) representing the degree of

being similar to node v

, where s represents the scale.

For simplicity sake, φ

[s]

(µ

[s]

,σ

[s]

) = φ

[s]

, to avoid and

excessive complicated notation.

Thus, the weight between two neighbour nodes v

and v

at scale s is deﬁned as in Equation 1.

[s]

dζ (1)

where ζ makes reference to the complete color

space. The more similarity between membership

functions, the higher weight w

[s]

. Moreover,functions

[s]

are normalized by deﬁnition so that

[s]

dζ = 1,

for every scale s. Notice that w

is only calculated

for neighbour pixels, according to the neighbourhood

provided by each scale s.

The algorithm ﬁrstly sorts pairs of nodes in V

based on their weights W, grouping these pairs un-

der the same subgraph in case at least one has no pre-

vious subgraph already assigned (∄ i, j, G

[s]

∈

∨ v

∈ G

, ∀i, j,k,m,s), and the dispersion of the

possible subgraph is within a certain bound, given by

the following relation presented in Equation 2:

[s+1]

i, j

≤

[s]

(2)

where i and j represent either subgraphs or nodes.

In other words, the relation in Equation 2 states that a

subgraph can gather new elements provided that uni-

formity within subgraph is bounded and could not get

disperse. Same case happens when there are only two

nodes to be gathered.

This method iterates along all weights sorted in

W so that every node is associated with a subgraph.

Next step consists on extracting the new membership

functions for each subgraph, based on the functions

associated with the nodes within such a subgraph.

For a given subgraph in the subsequent scale,

[s+1]

, the membership function is deﬁned as follows

in Equation 3.

[s+1]

[s]

dζ

(3)

HAND IMAGE SEGMENTATION BY MEANS OF GAUSSIAN MULTISCALE AGGREGATION FOR BIOMETRIC

APPLICATIONS

where N represents the number of nodes gathered

by subgraph G

[s+1]

. Notice φ

[s+1]

is normalized ac-

cording to deﬁnition, so that

[s+1]

dζ = 1.

However, the initial structure of 4-neighbourhood

grid is lost with this aggregation, and therefore a new

structure must be provided efﬁciently to these scat-

tered nodes. In addition to function φ

[s]

, every node

is provided with their location within image I in

terms of vertical and horizontal cartesian coordinates.

When obtaining G

[s+1]

, the centroid of those gathered

nodes is calculated, so that each subgraph on sub-

sequent scales have a position within image. This

centroid, ξ, allows to provide a structure in succes-

sive scales by means of Delaunay triangularization

(de Berg et al., 2008).

This operation represents the ﬁnal step in the loop,

since at this moment, there exist a new subgraph

[s+1]

at scale s + 1 where each G

[s+1]

represents a node, and edges E

[s+1]

are provided by

Delaunay triangulation, and weights W

[s+1]

are ob-

tained based on Equations 1 and 3.

The whole loop is repeated until only two sub-

graphs remain, as stated at the begining of this section

(G = G

∪ G

). However, due to the constraints pro-

vided to aggregate, the method could not aggregate

more segments, without achieving the goal of divid-

ing image into two subgraphs. Therefore, Equation 2

is in practice relaxed and stated as follows in Equation

[s+1]

i, j

≤

[s]

+ k

[s]

(4)

being k

[s]

a factor able to avoid aggregation

method from being stuck in the loop. This factor is

dinamically increased, according to previous method

necessities. However, initial value is set to k

[s]

= 0.1,

for each scale s.

The computational cost of this algorithm is quasi-

linear with the number of pixels, since each scale

gathers nodes in the sense that nodes in subsequent

scales are reduced by (in practice) a three times fac-

tor. Therefore, time to process the ﬁrst scale (which

contains the highest number of nodes) is greater than

the rest of times to process subsequent scales, and the

total time is comparable to two times the processing

time to aggregate ﬁrst scale.

Finally, image I is based on a transform from RGB

space to CIELAB (CIE 1976 L*,a*,b*) due to its abil-

ity to describe all visible colors by the human eye

(Gonzalez and Woods, 1992; Tan et al., 2009; Mo-

jsilovic et al., 2002). More speciﬁcally, layer a is con-

sidered as image I.

4 DATABASE

With the aim of evaluating the segmentation method

a synthetic database has been created, gathering dif-

ferent hand positions, rotation degrees and environ-

ments, being possible to assess to what extent the seg-

mentation algorithm can satisfactory perform a hand

isolation from background on real scenarios.

Many different backgrounds are considered so

that all possible scenarios are selected, containing tex-

tures from carpets, fabric, glass, grass, mud, different

objects, paper, parquet, pavement, plastic, skin and

fur, sky, soil, stones, tiles, tree, wall and wood. In ad-

dition, ﬁve different samples from every texture were

collected to provide a more realistic evaluation sce-

nario.

Initially, hands were taken with a blue-coloured

background, so that hand can be easily extracted,

being this prior segmentation result considered as

ground-truth for posterior segmentation evaluation.

This database contains a total of 120 individuals, with

their both hands and 20 acquisitions per hand. Some

acquisition examples of this database can be seen in

Figure 1.

Figure 1: Samples of ﬁrst database, with blue-coloured

background. Synthetic database is based on this database,

considering different backgrounds. In addition, the segmen-

tation result of this database, will be considered as ground

truth in a posterior evaluation.

Hand is then isolated and superposed to former

backgrounds, carrying out an opening morphologi-

cal operation (with a disk structural element of radius

5) for colour images (Gonzalez and Woods, 1992) to

avoid possible edges separating hand and underlying

texture, ensuring a more realistic image.

For each image, a total of 5 × 17 (ﬁve images and

17 textures) images are created. Therefore, second

database collects a total of 120× 2 × 20× 5 × 17 =

408000 images (120 individuals, two hands, 20 ac-

quisitions per hand, ﬁve images and 17 textures) to

properly evaluate segmentation on real scenarios. A

visual example of this database is provided in Figure

SIGMAP 2011 - International Conference on Signal Processing and Multimedia Applications

Figure 2: Samples from the synthetic database in different backgrounds for a given acquisition taken from the ﬁrst database.

5 EVALUATION CRITERIA

The proposed segmentation algorithm must be evalu-

ated according to criteria able to assess to what extent

the algorithm is able to isolate hand from background.

There exist several methods to evaluate segmentation

in literature (Chen et al., 2010; Unnikrishnan et al.,

2007; Meilˇa, 2005), but most of them consider several

manual/human segmentations carried out by different

individuals.

The presented evaluation criteria is based on a

ground-truth segmentation, automatically obtained,

but on the contrary,very reliable since the background

is very easily distinguishable from hand (Figure 1).

Therefore, the proposed method is based on F-Factor,

(Alpert et al., 2007), deﬁned as follows:

F =

2RP

R+ P

(5)

where P (Precision, Conﬁdence) stands for the

number of true positives (true segmentation, i.e. clas-

sify a hand pixel as hand) in relation to the number

of true positives and false negatives (false hand seg-

mentation), and R (Recall, Sensitivity) represents the

number of true positives in relation to the number of

true positives and false positives (false background

segmentation, i.e. consider background as hand). F-

Factor is within [0,1] interval, so that 0 states a bad

segmentation, while on the contrary 1 represents the

best segmentation result.

In addition, a very important aspect of segmenta-

tion algorithm regards the required time to perform

the aim of isolating objects on an image. This time

depends strongly on the image size, the computer

where experiments take place and the implementa-

tion, among other characteristics.

These former criteria will permit to assess to what

extent the proposed algorithm meet their goals in an

adequate time.

6 RESULTS

Under this section, results are presented according to

evaluation criteria presented in previous Section 5.

First of all, segmentation is evaluated in terms

of performance, considering F-Factor (Equation 5) as

the main criterion. The obtained results are summer-

ized in Table 1.

Reader can notice that those environments where

hand could be camouﬂaged (like mud, soil, parquet,

wood, ...) slightly decrease the performance of the

algorithm. In addition, visual examples of segmenta-

tion results with different backgrounds and hands are

provided in Figure 3.

The temporal performance for images of 640 ×

340 pixels is 18 seconds, in a MATLAB implementa-

tion to be run in a PC computer @2.4 GHz Intel Core

2 Duo with 4GB 1067 MHz DDR3 of memory. A

more reﬁned implementation remains as future work.

Nonetheless, this temporal result is very competitive

if compared to approaches in literature, (Chen et al.,

2010; Alpert et al., 2007).

7 CONCLUSIONS

This paper has presented an approach for hand bio-

HAND IMAGE SEGMENTATION BY MEANS OF GAUSSIAN MULTISCALE AGGREGATION FOR BIOMETRIC

APPLICATIONS

Table 1: Segmentation evaluation by means of factor F in a synthetic database with 17 different background textures.

Texture F(%) Texture F(%) Texture F(%)

Carpets 92.3±0.2 Paper 91.3±0.2 Stones 91.4±0.1

Fabric 89.1±0.1 Parquet 88.4±0.3 Tiles 90.3±0.2

Glass 94.3±0.1 Pavement 89.1±0.2 Tree 96.3±0.2

Grass 93.7±0.1 Skin and Fur 95.7±0.2 Wall 94.2±0.1

Mud 89.8±0.1 Sky 96.4±0.2 Wood 93.8±0.1

Objects 92.1±0.2 Soil 89.4±0.1

Original Image Ground-truth Synthetic Image Segmentation Result

Figure 3: A comparative study of results provided by segmentation algorithm in comparison to ground-truth. First column

gathers examples from ﬁrst database, together with their segmentation on second column, considered as ground truth. Third

column presents synthetic images based on ﬁrst column images, providing on the fourth column the ﬁnal segmentation result.

SIGMAP 2011 - International Conference on Signal Processing and Multimedia Applications

metric segmentation based on gaussian multiscale ag-

gregation. This method is able to isolate hand from

background in different situations, simulated by an

own synthetic public database, with a total of 408000

images.

The results highlight the fact that hand is isolated

with a competitive accuracy, providing a good result

for a posterior feature extraction, independently on

the background of the hand image.

Applications of this method are very suitable for

mobile applications, since hand mobile biometrics

must be able to identify individuals everywhere, with-

out no constrains on the background. However, more

efforts must be done to adapt this approach for mo-

bile biometrics, since its temporal performance is far

at present from being adequate for real-time applica-

tions. In addition, the time performance is still low

(18 seconds), when compared to other similar ap-

proaches in literature, and considering the challeng-

ing backgrounds to segment.

Future work regards an improvement and reﬁne-

ment in implementation, together with a mobile ori-

entation, so that mobile hand biometrics could beneﬁt

of a reliable segmentation algorithm, and therefore,

increase their identiﬁcation accuracy.

ACKNOWLEDGEMENTS

This research has been supported by the Ministry

of Industry, Tourism and Trade of Spain, in the

framework of the project CENIT-Segur@, reference

CENIT-2007 2004.

REFERENCES

Alpert, S., Galun, M., Basri, R., and Brandt, A. (2007). Im-

age segmentation by probabilistic bottom-up aggrega-

tion and cue integration. In IEEE Conference on Com-

puter Vision and Pattern Recognition, 2007. CVPR

’07., pages 1–8.

Chen, S., Cao, L., Wang, Y., Liu, J., and Tang, X. (2010).

Image segmentation by map-ml estimations. Image

Processing, IEEE Transactions on, 19(9):2254 –2264.

Comaniciu, D., Meer, P., and Member, S. (2002). Mean

shift: A robust approach toward feature space analy-

sis. IEEE Transactions on Pattern Analysis and Ma-

chine Intelligence, 24:603–619.

de Berg, M., van Kreveld, M., Overmars, M., and

Schwarzkopf, O. (2008). Computational Geometry:

Algorithms and Applications. Springer, 3rd edition.

Felzenszwalb, P. F. and Huttenlocher, D. P. (2004). Efﬁcient

graph-based image segmentation. Int. J. Comput. Vi-

sion, 59:167–181.

Garc´ıa-Casarrubios Mu˜noz, A., de Santos-Sierra, A.,

S´anchez-

Avila, C., Guerra-Casanova, J., Bailador-del

Pozo, G., and Jara-Vera, V. (2010). Hand biomet-

ric segmentation by means of fuzzy multiscale ag-

gregation for mobile devices. In Emerging Tech-

niques and Challenges for Hand-Based Biometrics

(ETCHB), 2010 International Workshop on, pages 1

–6.

Gonzalez, R. C. and Woods, R. E. (1992). Digital Im-

age Processing. Addison-Wesley Longman Publish-

ing Co., Inc., Boston, MA, USA.

Kang, W.-X., Yang, Q.-Q., and Liang, R.-P. (2009). The

comparative research on image segmentation algo-

rithms. In ETCS ’09: Proceedings of the 2009 First

International Workshop on Education Technology and

Computer Science, pages 703–707, Washington, DC,

USA. IEEE Computer Society.

Kukula, E. and Elliott, S. (2005). Implementation of hand

geometry at purdue university’s recreational center:

an analysis of user perspectives and system perfor-

mance. In Security Technology, 2005. CCST ’05. 39th

Annual 2005 International Carnahan Conference on,

pages 83–88.

Kukula, E. and Elliott, S. (2006). Implementation of hand

geometry: an analysis of user perspectives and sys-

tem performance. Aerospace and Electronic Systems

Magazine, IEEE, 21(3):3–9.

Meilˇa, M. (2005). Comparing clusterings: an axiomatic

view. In Proceedings of the 22nd international confer-

ence on Machine learning, ICML ’05, pages 577–584,

New York, NY, USA. ACM.

Mojsilovic, A., Hu, H., and Soljanin, E. (2002). Ex-

traction of perceptually important colors and similar-

ity measurement for image matching, retrieval and

analysis. Image Processing, IEEE Transactions on,

11(11):1238 – 1248.

Rory Tait Neilson, B. N. and McDonald, S. (2007). Image

segmentation by weighted aggregation with gradient

orientation histograms. Southern African Telecom-

munication Networks and Applications Conference

(SATNAC).

Sharon, E., Brandt, A., and Basri, R. (2000). Fast mul-

tiscale image segmentation. In IEEE Conference on

Computer Vision and Pattern Recognition, 2000. Pro-

ceedings., volume 1, pages 70 –77 vol.1.

Sharon, E., Brandt, A., and Basri, R. (2001). Segmenta-

tion and boundary detection using multiscale inten-

sity measurements. In IEEE Computer Society Con-

ference on Computer Vision and Pattern Recognition,

2001. CVPR 2001. Proceedings of the 2001., vol-

ume 1, pages I–469 – I–476 vol.1.

Sharon, E., Galun, M., Sharon, D., Basri, R., and Brandt, A.

(2006). Hierarchy and adaptivity in segmenting visual

scenes. Macmillan Publishing Ltd.

Shi, J. and Malik, J. (2000). Normalized cuts and image

segmentation. IEEE Transactions on Pattern Analysis

and Machine Intelligence, 22:888–905.

Shirakawa, S. and Nagao, T. (2009). Evolutionary image

segmentation based on multiobjective clustering. In

HAND IMAGE SEGMENTATION BY MEANS OF GAUSSIAN MULTISCALE AGGREGATION FOR BIOMETRIC

APPLICATIONS

CEC’09: Proceedings of the Eleventh conference on

Congress on Evolutionary Computation, pages 2466–

2473, Piscataway, NJ, USA. IEEE Press.

Tan, W., Wu, C., Zhao, S., and Chen, S. (2009). Hand

extraction using geometric moments based on active

skin color model. In Intelligent Computing and Intel-

ligent Systems, 2009. ICIS 2009. IEEE International

Conference on, volume 4, pages 468–471.

Unnikrishnan, R., Pantofaru, C., and Hebert, M. (2007). To-

ward objective evaluation of image segmentation al-

gorithms. IEEE Trans. Pattern Anal. Mach. Intell.,

29:929–944.

SIGMAP 2011 - International Conference on Signal Processing and Multimedia Applications