est technique was a straightforward summation of the
number of pixels comprising the change mask, with
a threshold value for what constitutes a significant
change (Rosin, 2002). Various techniques have been
employed to better define how the threshold is cho-
sen, but this simple differencing approach is unlikely
to provide as reliable results as later developments;
in particular, it is sensitive to noise and lighting vari-
ations (Lillestrand, 1972). Further development has
centred on significance and hypothesis testing, and on
predictive models, both spatial and temporal.
2.1 Gesture Recognition
Gestures are ambiguous and incompletely specified,
as they vary from one person to the next, and each
time a particular person gesticulates. Consequently,
the two main issues to resolve when recognising ges-
tures are to identify specific elements of the gesture,
and to have some prior knowledge of which gestures
to search for. Gesture recognition is achieved by ei-
ther attaching a sensor of some type to various parts of
the body, or from interpreting the image from a cam-
era. There is an inherent loss of information in inter-
preting the 2D image of a 3D space, and algorithms
which address this can be computationally expensive.
Identifying a hand gesture involves determining
the point in time when a gesture has started and
ended, within a continuous movement stream from
the hands, and then segmenting that time into recog-
nizable movements or positions. This is not a triv-
ial problem due to both the spatio-temporal variabil-
ity involved, and the segmentation ambiguity of iden-
tifying specific elements of the gesture (Mitra and
Acharya, 2007).
The use of Hidden Markov Models (HMM) has
yielded good results in gesture recognition, as ges-
tures consist of a set of discrete segments of move-
ment or position (Yamato et al., 1992). Sign language
recognition processes have been designed and imple-
mented using HMM’s (Starner and Pentland, 1996).
In the cited implementation, the user wore coloured
gloves, and the approach required extensive training
sets; it successfully recognized around fifty words
within a heavily constrained grammar set. The HMM
approach has been further developed, splitting each
gestures into a series of constituent ”visemes” (Bow-
den et al., 2004).
Gestures can also be modelled as ordered se-
quences of spatio-temporal states, leading to the use
of a Finite State Machine (FSM) to detect them (Hong
et al., 2000). In this approach each gesture is de-
scribed as an ordered sequence of states, defined by
the spatial clustering, and temporal alignment of the
points of the hands or fingers. The states typically
consist of a static start position, smooth motion to the
end position, a static end position, and smooth motion
back to the rest position. This approach is less suited
to detecting motion in small children (who tend not
to be static), and especially not those with impaired
movement ability.
2.2 Low Power Device Considerations
The key to gesture recognition in a two-dimensional
image is in identifying the parts of the image which
are relevant to the gesture, and monitoring their con-
tribution to the change mask. The quicker that the
elements of the change mask that are unrelated to
the gesture can be discarded, the more time the al-
gorithms have to process the gesture data.
Further to this, the smaller the amount of data that
is used to represent the change set, the faster the algo-
rithms for analysing that change are likely to be. Most
image capture methods used in gesture recognition re-
tain some sense of the overall image, or sections of it,
during analysis of the change set - for example, trac-
ing the movement of an edge between a section of
the image which is skin tone coloured, and a section
which is not.
A significant saving in computing power is also
made if the application has prior knowledge of which
gesture(s) it is searching for. If the algorithms need
to check for any gesture at any time, then this is dras-
tically more computationally expensive than attempt-
ing to detect a specific gesture at a particular instant
in time.
3 IMPLEMENTATION
The implementation which is described addresses the
potentially large amount of processing power required
in recognizing hand gestures in three ways.
Brightly coloured models are attached to the sub-
ject’s fingertips. This means that the image process-
ing software only needs to identify areas of specific,
pre-determined colours in the real-time moving im-
age. Further to this, the areas of specific colour are
reduced to a single coordinate per frame within the
two dimensional screen-space, which greatly speeds
up the gesture recognition process.
Each gesture which must be identified has been
designed to require tracking of no more than three fin-
gertips. This reduces the amount of data tracked from
frame to frame which again allows the algorithms to
perform on the lower power target device.
HEALTHINF2013-InternationalConferenceonHealthInformatics
316