Let us highlight the main ideas which have led to
the development of the proposed method.
While processing a frame we usually conduct two
consequent subroutines: hand detection in the frame
(at client-side) and, in the case of a positive outcome,
detailed analysis of this frame with a hand (at server-
side). Due to the network restrictions we can send
only several frames to the server during the identifi-
cation session.
That’s why hand detection procedure should ap-
prove for the further analysis as few images guaran-
teed to be unfit as possible (e.g. we need a low false
positive rate). And at the same time, it should process
a video stream from the camera at wide range of mo-
bile devices in real-time. Thus, the required method
has to meet very strict requirements both to the qual-
ity of recognition and performance.
There have been some research to detect hand us-
ing AdaBoost-based methods with promising results
in accuracy and speed (K¨olsch and Turk, 2004), (Fang
et al., 2007), (Xiao et al., 2010) suitable for use in un-
constrained environments (various kinds of lighting,
diverse background, etc.). But these methods gen-
erally need exhaustive classifier training and a large
dataset including samples with different rotations and
scaling. Also such methods don’t explicitly utilize
any hand geometrics like mutual disposition of fin-
gers or their proportions, making difficult to separate
“bad” hands (e.g. with partially “glued” fingers —
Fig. 3; such sample is ineligible for the shape analy-
sis procedure) from “good” ones. Another approach
is to use skin color based detection (Elgammal et al.,
2009), (Vezhnevets et al., 2003) but it is unreliable be-
cause of sensitivity to lighting conditions and messing
with skin-colored objects. Optical flow methods (So-
bral, 2013) demonstrate good results for stationary
cameras and permanently moving objects, so, cann’t
be directly used in our case without improvements.
The rest of the paper is organized as follows.
Equipment and collected dataset are described in Sec-
tion 2. The hand detection procedure is fully pre-
sented in Section 3. Further server-side image pro-
cessing is given in Section 4. Next, the experiment
results are introduced and discussed in Section 5. Fi-
nally, Section 6 shortly concludes the paper.
2 DATA AND EQUIPMENT
During the research we collected 80 short videos of
hands of 50 different people (1-3 video for each per-
son, the back side of the right hand was captured). All
videos were taken using cameras of mobile devices.
After that they were decomposed into frames (each
5th frame was used), which were saved as graphic
files (*.jpg or *.bmp). As a result, we got 2322 im-
ages.
Important notice: in our work we consider the as-
sistance of participants — the reasonable person’s in-
tention is to be correctly and quickly recognized by
the identification system. Thus, all cases of cheating
(and corresponding videos as well) are excluded from
consideration.
To form a qualitative dataset (as to get adequate
results from using MoPIS) one should follow the rec-
ommendations given below while capturing videos:
1. The videos should be recorded using a mobile
device with a camera matrix resolution at least
1.3 Megapixels (Mp), which produces video files
with a resolution of 640*480 and more and fre-
quency of 15 frames/second or higher. Choos-
ing a low or middle resolution camera gives a lit-
tle chance to extract any promising texture fea-
tures, though shape analysis stays rather effective.
Preference should be given to high-end devices
with hardware autofocus support. The optimal
duration is 3-6 seconds. For video recording one
can utilize an Android application which is a part
of MoPIS, and also similar applications. During
the experiments we mainly used the smartphone
LG G2 with 13Mp front camera (supports auto-
focus and optical stabilization), and recorded HD
video (1920x1080 or 1280x720 resolution and 30
frames/sec frequency) with a help of MoPIS An-
droid application. Also, some data was captured
using Samsung Galaxy Note 10.1 tablet (Fig. 1).
2. The background should be black or dark (homo-
geneity is not necessary) otherwise there may be
problems with binarization by Otsu (Otsu, 1979)
(and therefore with a hand detection in the frame).
This in turn will affect the quality of the further
analysis of the hand. So, we used black homoge-
neous cloth as a background in our experiments.
3. The camera is recommended to be stable, only the
tested hand should move. Modern mobile devices
have good optical stabilization system, so, there
is no need in a tripod or a holder to fix the po-
sition. Nonetheless, some of the videos of our
dataset were made with a help of a tripod (Fig. 1).
4. To increase the variability of frames obtained
from the video the probationer should slowly
move his fingers (bring together and separate
them) in the horizontal plane. One should avoid
sudden movements. To improve the represen-
tativeness of the dataset it’s highly advisable to
record a few videos from each hand in different
lighting conditions.
VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications
462