stable, but still a low level of noise is present. To
improve the robustness of the data acquisition process
and reduce the classification burden, we only retain
one sample from the sensors output (for further
classification) each time the data remains stable for a
pre-defined period of time, after having detected a
significant change.
5.1.2 Classification
Once having ensured stability of the data, we proceed
with the classification of the configuration. During a
preparatory stage we have compared the performance
of six classification algorithms, namely Random
Trees (RT) (Le Gall, 2005), Boost Cascade (BC)
(Viola and Jones, 2001), Neural Networks (NN)
(Haykin, 2004), K-Nearest Neighbours (KNN)
(Cover and Hart, 1967), Naive Bayes (NB) (Rish,
2001) and Support Vector Machine (SVM). For all
these algorithms we have used the default
configuration of the respective implementation
available in the Open Source Computer Vision
Library (OpenCV) (Bradski
and Kaehler, 2008). To
evaluate their performance we have used a dataset
composed of 40 samples for each hand configuration
(1680 samples in total). To reduce the variance of our
estimates we have used 10-fold cross validation. In
Table 1 and Table 2 we present the results of the
evaluation for each glove (right and left glove).
Table 1: Classification results of the 1680 samples,
obtained with the use of the left glove.
% RT BC NN KNN NB SVM
Precision 98,6 82,0 98,1 98,8 97,5 98,6
Accuracy 85,5 95,4 78,1 97,3 97,1 100,0
Table 2: Classification results of the 1680 samples,
obtained with the use of the right glove.
% RT BC NN KNN NB SVM
Precision 98,8 86,1 97,2 98,0 98,0 98,1
Accuracy 87,3 96,6 80,4 98,2 96,8 100,0
From these results, we may discard the Boost Cascade
algorithm, by far the worst of all. We have also
discarded Neural Network due to the high
computational cost when in comparison to the rest.
This is a serious drawback since we need a classifier
to use in real time. The remaining four algorithms,
present a high precision and accuracy. Based on these
results we have opted to use SVM classifiers. For
each configuration we have kept the top three
instances and their associated probability, meaning
that the application will take into consideration the
tree configurations with the highest probability and
their probability will be used in the classification to
increase the accuracy. These instances were used later
to build the classification model for word recognition.
A point to take into consideration is the fact that
intermediate (fake) configurations that constitute
only noise may occur during the transition between
two distinct configurations. As example we can see in
Figure 5 the transition from the configuration
corresponding to the letter "S" to the configuration
corresponding to the letter "B", where we obtain as
noise an intermediate configuration associated that
matches the hand configuration for number "5" in
PSL.
Figure 5: Transition from configuration S to configuration
B, through the intermediate configuration (noise) 5.
Intermediate configurations differ from the others by
the time component, i.e., intermediate configurations
have a shorter steady time, which is a constant feature
that may be used to distinguish between a valid
configuration and a noisy, intermediate
configurations. Thus, we use information about the
dwell time of each configuration as an element of
discrimination by setting a minimum execution
(steady) time below which configurations are
considered invalid.
5.2 Hand Motion and Orientation
To obtain information that allows characterizing the
movement and orientation of the hands we use the
Microsoft Kinect. In order to equalize the sampling
frequency between Kinect and the data gloves, we
reduced the frequency of sampling in the gloves to 30
samples per second. For each skeletal point, the
Kinect provides information about the position in
space (x, y, z) over time. Of the 20 points available
we only use 6, in particular the points corresponding
to the hands, elbows, hip and head. We consider that
a gesture is valid only if the dominant hand is
positioned above the hip, and a gesture is performed
with both hands only if both hands are above the hip.
Differences are notorious when there is a significant
dissimilarity in the level of proficiency in sign
language. The time that it takes to perform the gesture
is one of the most prevalent differences. So, in order
RealTimeBidirectionalTranslatorofPortugueseSignLanguage
207