
The data of Table 1 shows the resources required
for the FPGA implementation of the vision system.
This information was obtained from the compilation
reports supplied by the EDA tool. It should be noticed
that the neural network, in spite of being mainly com-
posed of memory cells, uses a significant amount of
logical elements in comparison to other blocks of the
system. This is due to the implementation of an inter-
nal bus capable of supplying all the image input bits in
parallel, a crucial feature for the system performance.
All blocks of the system are able to process at least
30 frames per second, which is the maximum rate
supported by the camera. As only 27.94% of the
FPGA capacity is used, it should be possible to add
new image processing functions to the vision system,
if required by other applications.
Table 1: Resouces usage: RAM Memory (Mem), Logi-
cal Elements (LE), and fraction of Logical Elements used
(LE(%)); Base: EP1S10F780C6 FPGA
Blocks
Mem
(Kbits)
LE
LE
(%)
Camera Control 0 71 0.67
Pixel Read 0 84 0.79
RGB2HSI 0 684 6.47
Segmentation 0 33 0.31
Filter and Compression 5 252 2.38
Image Centralization 1 180 1.70
Neural Network 229 1650 15.61
Total 235 2.954 27.94
On the Table 2 is presented the performance results
archive for each blocks of the system. The RGB2HSI
block is the slowest (31.88 frames per second), be-
cause it needs implementing operation like division
and multiplication and a complex hardware is neces-
sary for compute these equations.
Table 2: Performance of the system blocks: Operation Fre-
quency (OF), Image Size(IS) or Number of Neuron (
Ä
**),
and Frames per Second (FS); Base: EP1S10F780C6 FPGA
Blocks
OF
(MHz)
IS FS
Camera Control 269.11 - -
Pixel Read 422.12 320x240 1374
RGB2HSI 16.88 320x240 31
Segmentation 109.79 320x240 1429
Filter and
Compression 48.89 320x240 212
Image
Centralization 18.50 32x24 3437
Neural Network 73.48 96
Ä
** 382708
The efficiency of the recognition process has been
evaluated in order to determine if the hit rate is satis-
factory. During the process, we have found that the
number of samples used during the training phase of
the neural network plays a significant role in the final
results.
To determine that 50 samples of each gesture are
enough, the values of the neural network counter for
each recognition operation were analyzed. The neural
network model used in this system allows the counter
to vary between 0 to 95, the greatest value determin-
ing the winner pattern of the recognition process. By
analyzing the difference between counters, it was con-
cluded that using less than 50 samples in the training
process results in small differences between the win-
ner pattern and the others. This clearly indicates a
low confidence degree. On the other hand, using more
than 50 samples for training tends to saturate the neu-
ral network, as a large number of counters become
close to the maximum value. This also reduces the
confidence degree.
Once defined the number of samples for training,
the efficiency test was performed. The system was
trained to recognize 7 gestures, all of them repre-
sented by the internal part of hand, as seen in Sec-
tion 3. A total of 700 gestures were presented to the
system (100 of each type), where variation of the dis-
tance and inclination of the hand in front of the cam-
era was made to turn the system more reliable. For
each of them it was verified if the recognition was cor-
rect, wrong, or not possible. After several testing runs
it was concluded that 50 samples of each gesture are
enough to obtain 99.57% of hit rate. One of the main
reasons to achieve this efficiency level is the transla-
tion of a gesture to the centre of the image sent to the
neural network. The results of this analysis can be
seen in Table 3. The True Recognition column shows
the percentage of correct interpretation of the gesture,
while the False Recognition column shows the per-
centage of the gestures wrongly interpreted. Finally,
the column labed No Recognition shows the rate of
unrecognized gestures.
5 CONCLUSION
In this paper we have shown a real time gesture recog-
nition system for mobile robots, implemented as a
SoC using FPGA technology. The main task of the
system uses a RAM-based Neural Network to rec-
ognize seven gestures. The primary motivation for
the development of this vision system is its integra-
tion to a mobile robot intended to help people with
disabilities and requiring alternative communication
interfaces. The resulting system showed to be ro-
bust, allowing the high performance required for real-
time processing (30fps). This performance could
A REAL TIME GESTURE RECOGNITION SYSTEM FOR MOBILE ROBOTS
213