The first device that comes to mind when thinking
about tracking human poses is the Microsoft Kinect
sensor (Microsoft Corporation, 2010) (see Figure 1).
Being one of the first affordable consumer devices in
this area it is widespread. The Kinect is suitable for
tracking the entire human body. However, it does not
offer sufficient resolution for tracking individual fin-
gers. Additionally, the maximum frame rate at the
highest resolution is only 30 frames per second.
A similar device is the SoftKinetic (Intel Corpo-
ration, 2013) (see Figure 1). The SoftKinetic works
in a range from 0.1m to 1.1m. It is geared towards
tracking faces and hands, with frame rates around 30
frames per second.
The most recent device that is completely geared
towards tracking hands is the Leap Motion con-
troller (Leap Motion, Inc., 2012) (see Figure 1).
While the Leap launched on July 22, we got a pre-
production unit to work with. The Leap sensor is
able to determine the location and orientation of the
users hand with sub-millimeter precision. Compared
to the other devices it provides the highest accuracy in
the tracking of hands and fingers, and operates at up
to 295 frames per second. Those specifications look
promising enough to envision the use of this device in
intuitive and interactive 3D modeling.
1.2 Previous Work
3D user interfaces have been an active topic in the
computer vision and graphics communities and many
interesting papers have been published over the last
decades. For a broad overview over 3D spatial in-
teraction, we refer our readers to the Siggraph 2011
course notes on “3D spatial interaction: application
for art, design, and science” (LaViola and Keefe,
2011) and references therein.
While over the last years, a lot of research has
been targeted at large scale 3D body gesture recog-
nition using the Kinect Sensor (see (Ren et al., 2011;
Gallo et al., 2011) and references therein), only little
effort has been put into assessing the abilities of 3D
interfaces for 3D object modeling and manipulation.
Using custom hardware, one very advanced ap-
proach towards modeling using 3D interaction was
conducted by Araujo et al. (Araujo et al., 2013) by
showing their tool “Mockup Builder”. They use a
multitouch screen with 3D projection, a Kinect and
mechanical tracking of positions in space. This com-
plicated setup remedies a lot of problems we encoun-
tered but the sheer amount of hardware does not seem
feasible to us.
Hilliges et al. (Hilliges et al., 2009) have imple-
mented a system (hard- and software) that combines
a multitouch device with depth information. In their
system they can pinch an object and move it in 3D.
They encountered similar problems with depth per-
ception we will describe later on (see Section 4.2).
In their application the scene is presented to the user
from a birds eye view, therefore shadows are suffi-
cient to correctly gauge relative depth.
Ren and O’Neill (Ren and O’Neill, 2013) address
the problem of 3D selection using freehand gestures.
They state that a single action for this task is not feasi-
ble because of accuracy issues. In their work they pro-
pose that a series of low level movements improves
the precision of the selection. To get a better un-
derstanding for this we recommend the survey paper
of Argelaguet and Andujar (Argelaguet and Andujar,
2013). This is a special case of handling clicking ges-
tures. We will describe this in a more general manner
later on.
Another approach that was taken in the past to-
wards an accurate retrieval of hand gestures are HMI
(human machine interface) gloves. See for example
the work of Saggio et al. (Saggio et al., 2010). These
gloves are equipped with sensors at every joint recov-
ering every movement. They allows for a very accu-
rate retrieval of relative finger positions. However the
absolute position of the hand in 3D still can not be de-
termined. Also the usability suffers and the price of
such systems is quite high.
In this context, one last bit of related work we
want to mention is the work with haptic feedback de-
vices like for example the PHANTOM (Massie and
Salisbury, 1994). Using such a device the user holds
a pen that is connected to a “robot arm”. This enables
a very accurate tracking and allows the user to per-
form a clicking gesture through a button on the pen. It
also provides haptic or tactile feedback. Given such a
device many of the problems we will encounter later
on are remedied. However we want to concentrate
on working with affordable, off the shelf, consumer-
grade hardware.
2 DEVICE LIMITATIONS
Given that we want to work with affordable
consumer-grade devices, challenges due to device
limitations have to be addressed. As our primary
device we use the pre-production Leap Motion con-
troller. With this setup we can identify two major
problems, a limited field of view and occlusion.
The field of view (in case of the Leap, a cone with
an opening angle of approximately 90
◦
) is especially
problematic as the user can neither see it nor gets
feedback upon leaving it. Given the restricted inter-
GRAPP2014-InternationalConferenceonComputerGraphicsTheoryandApplications
418