3D Gesture Recognition by Superquadrics

Ilya Afanasyev and Mariolino De Cecco

Mechatronics Lab., University of Trento, via Mesiano, 77, Trento, Italy

Keywords: Superquadrics, Gesture Recognition, Microsoft Kinect, RANSAC Fitting, 3D Object Localization.

Abstract: This paper presents 3D gesture recognition and localization method based on processing 3D data of hands in

color gloves acquired by 3D depth sensor, like Microsoft Kinect. RGB information of every 3D datapoints

is used to segment 3D point cloud into 12 parts (a forearm, a palm and 10 for fingers). The object (a hand

with fingers) should be a-priori known and anthropometrically modeled by SuperQuadrics (SQ) with certain

scaling and shape parameters. The gesture (pose) is estimated hierarchically by RANSAC-object search

with a least square fitting the segments of 3D point cloud to corresponding SQ-models: at first – a pose of

the hand (forearm & palm), and then positions of fingers. The solution is verified by evaluating the

matching score, i.e. the number of inliers corresponding to the appropriate distances from SQ surfaces and

3D datapoints, which are satisfied to an assigned distance threshold.

1 INTRODUCTION

Gesture recognition, having the goal of interpreting

human gestures via mathematical algorithms, is the

important topic in computer vision with many

potential applications such as human-computer

interaction, sign language recognition, games, sport,

medicine, video surveillance, etc. The model-based

methods of hand gesture tracking have been studied

by a high number of researchers (Rehg and Kanade,

1995); (Starner and Pentland, 1995); (Heap and

Hogg, 1996); (Zhou and Huang, 2003); (La Gorce,

et al., 2008). Some publications used hand tracking

by color gloves with data acquired by fixed-position

webcams (Geebelen et al., 2010) or a single camera

(Wang and Popović, 2009). The hand tracking with

quadrics was used by the authors (Stenger et al.,

2001), but they had a model consisted of 39

quadrics, representing only palm and fingers.

The proposed method of 3D gesture recognition

by SQ is close to the corresponding hierarchical

method (Afanasyev et al., 2012) for 3D Human

Body pose estimation by SQ applied for processing

3D data captured by a multi-camera system and

segmented by a special preprocessing clothing

algorithm. In this paper, the object of recognition is

hand gesture; the sensor is MS Kinect; 3D point

cloud segmentation is provided by analyzing Kinect

RGB-depth data of color gloves. As far as a hand

and fingers can be a priori modeled with

anthropometric parameters in a metric coordinate

system, we propose using the hierarchical

RANSAC-based model-fitting technique with the

composite SQ-models. As known SQs can be used

for description of complex-geometry objects with

few parameters and generation of a simple

minimization function of an object pose (Jaklic et

al., 2000) and (Leonardis et al., 1997). The logic of

3D Gesture Recognition algorithm is clarified by the

block diagram (Fig. 1).

The gesture recognition starts with pre-

processing 3D datapoints (captured by MS Kinect),

segmenting them into 12 parts (forearm, palm and

10 for fingers) according to colors of gloves. Then

the algorithm recovers 3D pose of the hand as the

largest object (“Hand Pose Search”) and after that

restores fingers pose (“Fingers Pose Search”). To

cope with measurement noise and outliers, the Pose

Search is estimated by RANSAC-SQ-fitting

technique. The fitting quality is controlled by inlier

thresholds (for hand & fingers), which are a ratio of

the optimal amount of inliers to whole data points.

The tests showed that Hand Pose Search can give a

wrong palm position satisfying a palm threshold, but

troubling Fingers Pose Search. For this reason, when

a finger inliers solution less than a finger threshold,

the algorithm restarts the Hand Pose Search again

until finding suitable results for every fingers.

429

Afanasyev I. and De Cecco M..

3D Gesture Recognition by Superquadrics.

DOI: 10.5220/0004348404290433

In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP-2013), pages 429-433

ISBN: 978-989-8565-48-8

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

Figure 1: The block diagram of 3D Gesture Recognition

algorithm.

2 3D GESTURE RECOGNITION

ALGORITHM

2.1 About 3D Sensor and Data

The proposed method of 3D gesture recognition

works with 3D sensors captured 3D coordinate and

color information, like RGB-D (Red Green Blue -

Depth) cameras, multicamera systems, etc. We used

MS Kinect with 3D scanning software “Skanect”

developed by Nicolas Burrus (Burrus, 2011). The

software corrects the image distortions and captures

3D object rawdata in a metrical coordinate system

with the origin at Kinect 3D depth sensor (Fig. 2, 3).

Figure 2: Images acquired by Kinect RGB camera (left,

bottom) and Kinect 3D depth sensor (left, top); combined

image from Skanect software (right).

Figure 3: 3D cloud point of a hand with a color glove

captured by Kinect with color segmentation.

2.2 Superquadric Parameters

The implicit SQ equation is well suited to

mathematical modeling for fitting 3D data (Jaklic et

al., 2000) and (Leonardis et al., 1997):

()

εε

⎟

⎠

⎞

⎜

⎝

⎛

⎟

⎠

⎞

⎜

⎝

⎛

⎟

⎠

⎞

⎜

⎝

⎛

⎟

⎠

⎞

⎜

⎝

⎛

zyxF



(1)

where x, y, z – superquadric coordinate system;

, a

– scale parameters of the object;

, ε

– object shape parameters.

The explicit SQ equation is used for SQ

visualization (where η, ω – spherical coordinates):

⎥

⎦

⎤

⎢

⎣

⎡

⋅⋅

⋅⋅⋅⋅

⎥

⎦

⎤

⎢

⎣

⎡

sin)(sin

sin)(sincos)(cos

cos)(coscos)(cos

εε

ηη

ωωηη

signuma

signumsignuma



(2)

Figure 3 illustrates 3D point cloud of hand & finger

VISAPP2013-InternationalConferenceonComputerVisionTheoryandApplications

430

modeled in 12 superquadrics – superellipsoids with

the shape parameters ε

= ε

= 0.6 and the following

scaling parameters for:

- Forearm: a

= a

= 0.025, a

= 0.115 (m).

- Palm: a

= a

= 0.04, a

= 0.018 (m).

- Phalange: a

= a

= 0.008, a

= 0.027 (m).

2.3 Hand & Fingers in Superquadrics

2.3.1 Transformation for a Hand

SQ position of a Hand is defined by rotations α, β, γ

among x, y, z (clockwise) correspondingly and the

translation of SQ center (x

, y

, z

) along x, y, z. The

transformation matrix T

for the HAND is:

where

1000

100

010

001

⎥

⎦

⎤

⎢

⎣

⎡

⋅=

() ()

()

(

)

() ()

⎥

⎦

⎤

⎢

⎣

⎡

−

⋅

⎥

⎦

⎤

⎢

⎣

⎡

−

⋅

⎥

⎦

⎤

⎢

⎣

⎡

−

1000

0100

00cossin

00sincos

1000

0cos0sin

0010

0sin0cos

1000

0cossin0

0sincos0

0001

γγ

ββ

αα

(3)

2.3.2 Transformations Hand – Wrist, and

Bottom – Upper Phalange Joint

The transformations Hand - Wrist (H-W) and

Bottom – Upper Phalange Joint (BP-UPJ) are the

similar and correspond to the matrix:

1000

100

010

001

⎥

⎦

⎤

⎢

⎣

⎡

UPJ

⎥

⎦

⎤

⎢

⎣

⎡

aPPwhere

UPJ



(4)

2.3.3 Transformation Wrist - Palm

The transformation Wrist - Palm (W-P) is calculated

by rotations ξ, ρ, σ among x, z, y (clockwise)

correspondingly and the translation of SQ center on

distance a

along y.

()

1000

0100

010

0001

⎥

⎦

⎤

⎢

⎣

⎡

⋅==

RTT

σρξ



(5)

where R

is the rotation matrix of Palm:

() ()

()

(

)

() ()

1000

0cos0sin

0010

0sin0cos

1000

0100

00cossin

00sincos

1000

0cossin0

0sincos0

0001

⎥

⎦

⎤

⎢

⎣

⎡

−

⋅

⎥

⎦

⎤

⎢

⎣

⎡

−

⋅

⎥

⎦

⎤

⎢

⎣

⎡

−

σσ

ρρ

ξξ

(6)

Figure 4: Presentation of Hand in 12 parts: H – hand, P –

palm, W – wrist, BPJ/UPJ – Bottom/Upper Phalange

Joints, BP/UP – Bottom/Upper Phalanges, etc.

2.3.4 Transformation: Palm – Bottom

Phalange Joint

The transformation Palm – Bottom Phalange Joint

(P-BPJ) corresponds to the matrix:

1000

100

010

001

⎥

⎦

⎤

⎢

⎣

⎡

BPJ

()

sin

cos

⎥

⎦

⎤

⎢

⎣

⎡

Pwhere

BPJ



(7)

2.3.5 Transformation: Bottom Phalange

Joint - Bottom Phalange

The transformation Bottom Phalange Joint – Bottom

Phalange (BPJ-BP) is created by rotations δ and ε

among x and z (clockwise) correspondingly and the

translation of SQ center on –a

along y.

() ()

(

)()

() ()

1000

0100

010

0001

1000

0100

00cossin

00sincos

1000

0cossin0

0sincos0

0001

⎥

⎦

⎤

⎢

⎣

⎡

⋅

⎥

⎦

⎤

⎢

⎣

⎡

−

⋅

⎥

⎦

⎤

⎢

⎣

⎡

−

BPJ

εε

δδ

(8)

2.3.6 Transformation: Upper Phalange Joint

- Upper Phalange

The transformation Upper Phalange Joint - Upper

Phalange (UPJ-UP) is created by the rotation θ

3DGestureRecognitionbySuperquadrics

431

among x (clockwise) and the translation of SQ

center on –a

along y.

()

() ()

1000

0cossin0

sincos0

0001

−

⎥

⎦

⎤

⎢

⎣

⎡

−

θθ

UPJ



(9)

2.3.7 Full Transformation: Hand – Upper

Phalange

Finally, taking into account equations (4)-(9), the

full transformation for every point of system “Hand

– Upper Phalange” (H-UP) is:

()

HUPJ

UPJ

BPJ

PTTTTTTP ⋅⋅⋅⋅⋅⋅=

−1



(10)

where P

, P

– coordinates of Hand and Upper

Phalange points correspondingly (Figure 3).

2.4 RANSAC-SQ-Fitting Algorithm

The Hand and Fingers Pose Searches are very

similar and have the common logic (Figure 1).

RANSAC Hand Fitting algorithm is used to find the

hand pose hypothesis, i.e. 6 variables: 3 rotation (α,

β, γ) and 3 translation coordinates (x

, y

, z

). These

variables are needed to calculate the transformation

matrix T

(3). The model described by the

superquadric implicit equation (1) is fitted to 3D

datapoints sorted by segmentation. Each RANSAC

sample calculation is started by picking a set of

random points (s = 6) in the world coordinate system

, y

, z

). The following equation is used to

transform these points to the SQ centered coordinate

system (x

, y

, z

),,(

⎥

⎦

⎤

⎢

⎣

⎡

⋅=

−

iiii

HANDssss

TzyxF



(11)

where

1−

HAND

is the matrix of inverting homogeneous

transformation for the hand (3).

Then the inside-outside function is calculated

according to the superquadric implicit equation (1)

in world coordinate system:

)()()(

εε

⎟

⎠

⎞

⎜

⎝

⎛

⎟

⎠

⎞

⎜

⎝

⎛

⎟

⎠

⎞

⎜

⎝

⎛

⎟

⎠

⎞

⎜

⎝

⎛

iii

ssssss



(12)

The inside-outside function for superquadrics has 11

parameters (Jaklic et al., 2000) and (Solina and

Bajcsy, 1990):

),,,,,,,,,,,,,,(

21321 cccWWWw

zyxaaazyxFF

iiii



(13)

where 5 parameters of the SQ size and shape are

known (a

, a

, ε

) and other 6 parameters (α,

β, γ, x

, y

, z

) should be found by minimizing the

cost-function:

(

)

,1)(min

−=

∑

FxF



(14)

where additional exponent ε

ensures that the points

of the same distance from SQ surface have the same

values of F

(Solina and Bajcsy, 1990).

SQ fitting to the random dataset by minimizing

the inside-outside function of distance to SQ surface

is realized by nonlinear least-square minimization

method with “Trust-Region algorithm” (or

“Levenberg-Marquardt algorithm”). The amount of

inliers is estimated by comparing the distances

between every point of 3D point cloud and SQ

model with distance threshold t (t = 1 cm):

(

)

321

−⋅⋅⋅=

Faaad



(15)

3 RESULTS

The Figure 5 shows the workability of the

RANSAC-based model-fitting the composite SQ-

model to 3D point cloud for Gesture Recognition.

The presented example concludes 4848 points of 3D

data, achieving about 65% of inliers for distant

threshold 1 cm. The algorithm has been developed in

MATLAB. The RGB-D information was obtained

with Microsoft Kinect and then processed offline

(taking about several minutes for a gesture). The

quality of gesture recognition depends on quality of

segmentation that requires good illumination

condition and using gloves with bright colors. For

some gestures (when fingers are hidden) the method

cannot correctly recognize the finger poses.

4 CONCLUSIONS

The paper describes a method of 3D Gesture

Recognition by SuperQuadrics (SQ) from 3D point

cloud data captured by Microsoft Kinect and

clustered according to the colors of the color gloves.

The hand was modeled by a composite SQ-model

consisted of forearm, palm and fingers with a-priori

known anthropometric dimensions. The proposed

method based on hierarchical RANSAC-pose search

with a robust least square fitting SQs to 3D data: at

VISAPP2013-InternationalConferenceonComputerVisionTheoryandApplications

432

first for the hand, then for the fingers. The solution

is verified by evaluating the matching score and

comparing this score with admissible inlier threshold

for the hand and fingers. The gesture estimation

technique described has been tested by processing

3D data offline, giving encouraging results.

ACKNOWLEDGEMENTS

The work of Ilya Afanasyev on creating the

algorithms of 3D Gesture recognition has been

supported by the grant of EU\FP7-Marie Curie-

COFUND-Trentino postdoc program, 2010-2013.

The authors are very grateful to colleagues from

Mechatronics Lab., UniTN, for help and support.

REFERENCES

Afanasyev I., Lunardelli M., De Cecco M., et al. 2012. 3D

Human Body Pose Estimation by Superquadrics. In

Conf. Proc. VISAPP’2012 (Rome, Italy), V.2, 294-302.

Burrus N. Kinect software “Skanect-0.1”. 2011.

http://manctl.com/products.html.

Heap A.J. and Hogg D.C., 1996. Towards 3-D hand

tracking using a deformable model. In Conf. Proc. on

Face and Gesture Recognition, P.140–145.

Geebelen G., Cuypers T., Maesen S., and Bekaert P.,

2010. Real-time hand tracking with a colored glove.

In Conf. Proc. 3D Stereo Media.

Jaklic A., Leonardis A., Solina F., 2000. Segmentation and

Recovery of Superquadrics. Computational imaging

and vision 20, Kluwer, Dordrecht.

La Gorce M., Paragios N., Fleet D., 2008. Model-Based

Hand Tracking with Texture, Shading and Self-

occlusions. In IEEE Conf. Proc. CVPR. P.1-8.

Leonardis A., Jaklic A., Solina F., 1997. Superquadrics for

Segmenting and Modeling Range Data. In IEEE Conf.

Proc. PAMI-19 (11). P. 1289-1295.

Rehg J.M. and Kanade T., 1995. Model-based tracking of

self-occluding articulated objects. In IEEE Conf. Proc.

on Computer Vision, P. 612–617.

Solina F. and Bajcsy R., 1990. Recovery of parametric

models from range images: The case for superquadrics

with global deformations. IEEE Transactions PAMI-

12(2):131-147.

Stenger B., Mendonca P.R.S., and Cipolla R., 2001.

Model-based 3D tracking of an articulated hand. In

IEEE Conf. Proc. CVPR 2001 (2): 310-315.

Starner T. and Pentland A., 1995. Real-time american sign

language recognition from video using hidden Markov

models. In IEEE Proc. Computer Vision, P. 265-270.

Wang R.Y. and Popović J., 2009. Real-time hand-tracking

with a color glove. ACM Transactions on Graphics

(TOG), 28 (3), 63.

Zhou H. and Huang T. S., 2003. Tracking articulated hand

motion with eigen dynamics analysis. In IEEE Conf.

Proc. on Computer Vision, V. 2, P. 1102–1109.

APPENDIX

Figure 5: 3D gesture recognition by Superquadrics.

3DGestureRecognitionbySuperquadrics

433