MOTION TRACKING WITH REFLECTIONS

3D pointing device with self-calibrating mirror system

Shinichi Fukushige, Hiromasa Suzuki

Department of Precision Enginieering, The University of Tokyo,7-3-1 Hongo Bunkyo, Tokyo, Japan

Keywords: Interactive pointing device, 3D input, reflections, motion tracking.

Abstract: We propose a system that uses a camera and a mirror to input behaviour of a pointer in 3D space. Using

direct and reflection images of the pointer obtained from single directional camera input, the system

computes the 3D positions and the normal vector of the mirror simultaneously. Although the system can

only input the ‘‘relative positions’’ of the pointer, in terms of 3D locations without scale factor, calibration

of the mirror orientation is not needed. Thus, the system presents a very simple and inexpensive way of

implementing an interaction device.

1 INTRODUCTION

Input devices for processing 3-dimensional (3D)

computer-generated models are divided into two

types - those with a 2-dimensional (2D) interface

and those with a 3D interface. 2D input devices,

such as mice, tablets, and touch monitors, are used

more widely than 3D input devices because of their

simplicity and easy-use. However, because the target

is in 3D space, 2D-base-input equipment needs

several constrains and restrictions of pointer

movements (Sugishita, 1996) (Zeleznik, 1996)

(Branco, 1994). Preconditions, for translating 2D

input operations into 3D, often hinder the intuitive

input operations of designers.

Therefore, recently, various devices have been

developed which can directly indicate the position

on 3D space.

Currently, however, 3D input devices are not

widely used among general users and are not used as

general-purpose tools due to their costs and

complexity, requiring special sensors for treating

magnetism, ultrasonic waves and laser, or having

complex structures, such as joint or wire

mechanisms or stereo camera systems (Kenneth,

1994) (Sato, 2000) (Smith, 1995) (Turban, 1992).

Stereovision is commonly used to calculate 3D

positioning of a pointer by implementing images

from more than one single camera (Faugeras, 1993)

(Yonemoto, 2002) (Xu, 1996) (Longuet-Higgins,

1981). However, processing multiple video images

in real time thus requires large amount of CPU

resources or special hardware. Furthermore, these

methods involve synchronization and complex

computations that usually require an initial

calibration phase. Since multiple cameras must be

placed at separated positions to ensure full 3D

restoration accuracy, it is difficult to miniaturize

such systems.

We would like to provide a simple 3D pointing

device that users can handle easily and with a feeling

of familiarity. This paper proposes a system for

assuming the 3D motion of a pointer in real time by

inputting a single video image of the pointer tip with

a mirror reflection. Conventionally, in order to

determine an object's 3D positioning from a single

view, the shape and size of the object or multiple

markers on it should be recognized simultaneously.

And the restoration accuracy of them are low in the

direction of the optical axis.

The proposal method is different from the

method of Lane et al (Lane, 2001), which also uses a

mirror reflections and estimates the ‘‘absolute’’ 3D

positions. This method needs manual calibration and

must divide the 3D space into a mirror reflection

area and an inputting area.

We propose using a mirror system with self

calibration which estimates the relative 3D positions

of the pointer. ‘‘Relative positions’’ mean that the

restored x, y, z coordinates of the pointer include the

same unknown parameter regarded as a scale factor.

However, in the 3D pointing usage, the scale

factor can be set freely by a user, because the fine

428

Fukushige S. and Suzuki H. (2006).

MOTION TRACKING WITH REFLECTIONS - 3D pointing device with self-calibrating mirror system.

In Proceedings of the First International Conference on Computer Vision Theory and Applications, pages 428-434

DOI: 10.5220/0001377404280434

Copyright

c

SciTePress

motion tracking is more important than inputting

absolute positioning value in the real world.

2 PRINCIPLE OF 3D MOTION

TRACKING FROM SINGLE

DIRECTIONAL IMAGES

Under the principle of estimating the 3D motion of a

pointer via a single camera images, we use

reflections of a mirror plane. We assume the internal

camera parameters such as focal length are pre-

calibrated. And normalized camera coordinates are

used. The image plane of the normalized camera is

in the place of unit length, i.e., 1 from a focal point,

i.e., the

z

axis is taken as direction of the optical

axis of a camera, and

1=z

is the image plane. Any

standard camera may be used, because any standard

camera image coordinates can be easily converted

for normalized image coordinates. We can thus

consider the problems of vision using a normalized

camera regardless of individual camera parameters

(Xu, 1996).

2.1 Self-calibration of the System

Initially, the proposal system estimate the orientation

of the mirror by using the 2D positions of direct and

reflected pointer images projected on a camera’s

image plane (see Figure 1).

Figure 1: Over view of the proposal system.

The

),,(

zyx

NNN=N

is the foot point of the

camera's focal point

)0,0,0(

=

O

to the mirror

plane. And the normal vector of the mirror

N

λ

can

be estimated from projected images of the 3D point

P

which is the tip of the pointer moving freely in

the 3D space. Thus, we can get more than four

projected 2D points from more than two 3D points

by tracking the movement of

P

.

Supposing that, at one time, the point is located

on

1

P

, and at other time it is located on

2

P

(

21

PP ≠

). Then,

)1,,(

11 yx

mm=

1

m

and

)1,,(

222 yx

mm

=

m

are the points projected directly

onto the image plane, and

)1,,(

11 yx

mm

′′

=

′

1

m

and

)1,,(

222 yx

mm

′

′

=

′

m

are the points reflected by the

mirror and projected onto the image plane from

1

P

and

2

P

respectively (see Figure 2).

Reflection of light from a mirror is governed by

the two Laws of Reflection:

(1) The incident ray, reflected ray and normal at the

point of incidence lie on the same plane.

(2) The angle which the incident ray makes with the

normal (angle of incidence) is equal to the angle

which the reflected ray makes with the normal

(angle of reflection).

From the law (1), relation among

1

m

,

1

m

′

,

2

m ,

2

m

′

and

N

is written as follows:

Nmm

=

′

+

1111

β

α

(1)

Nmm

=

′

+

2222

β

α

(2)

Where, only the

1

m ,

1

m

′

,

2

m and

2

m

′

are given

value and are on the same plane (z=1). Here

the

1

α

,

2

α

,

1

β

and

2

β

are scalar.

Figure 2: The point P moves from the position P1 to P2.

But, at this time, directly projected points and

reflected points have not been distinguished. Using

those four 2D points, there can be six straight lines

that pass by the every two points and intersect at the

three points shown as Figure 3.

Figure 3: The four projected points and the six straight

lines that pass by the every two points.

MOTION TRACKING WITH REFLECTIONS - 3D pointing device with self-calibrating mirror system

429

Two of these six lines are the nodal lines of the

image plane and the two planes described as (1) and

(2). Thus, in an ideal case, these two lines should

always intersect at the same point

)1,,(

ˆ

zyzx

NNNN=N

which is the intersection

point of the image plane and the line extended to

N

from the focal point

O

.

Therefore, one of the three intersection points,

whose movement is minimum can be thought as the

N

ˆ

. And the points near

N

ˆ

on the lines is reflection

point, and the points far from

N

ˆ

can be the directly

projected points.

Also

N

lies on the intersection line of the two

planes (1) and (2). From these formulas, the

N

λ

is

determined as follows:

xyyxxyyx

xyxyyxyxz

xyyyyxyxyxyy

xyyyxyyyxyyxy

xxyyxxxxyxxy

xyxxyxyxxyxxx

mmmmmmmm

mmmmmmmmN

mmmmmmmmmmmm

mmmmmmmmmmmmN

mmmrmmmmmmmm

mmmmmmmmmmmmN

21211211

21212121

212211211221

121221121211

21211211121

211221121221

′′

+

′′

−

′

+

′

−

′

−+−

′

=

′′

+

′′

−

′

−

′

−

′

+

′

+

′

−

′′

=

′′

+

′′

−

′′

−

′

+

′

′

+

′

−

′

−

′

=

λ

λ

λ

The accuracy of the estimated

N

λ

depends on the

combination of the four projected points. From the

lows of reflection (1), the relation among the four

points on the image plane is shown as Figure 4.

Figure 4: The four projected points on the image plane.

Where,

)1,,(

ˆ

zyzx

NNNN=N

is the intersection

point of the image plane and the line extended to

N

from the focal point

O

. We suppose that the

accuracy of the

N

λ

is evaluated with evaluation

function

E

defined by

⎟

⎟

⎠

⎞

⎜

⎜

⎝

⎛

′

+

′

+

′

+

′

+=

22

2

11

1

sin

1

ll

l

ll

l

E

ϕ

θ

(3)

Where

ϕ

is a constant (7.5 in our experiments), and

||

ˆ

||||

ˆ

||

||)

ˆ

()

ˆ

(||

sin

1

1

2

2

mNmN

mNmN

−−

−×−

=

θ

||||

111

mm

′

−

=

l

,

||

ˆ

||

11

Nm −

′

=

′

l

||||

222

mm

′

−

=

l

,

||

ˆ

||

22

Nm −

′

=

′

l

While calibrating the normal of the mirror, the

user ought to move the pointer widely, and the

system tracks the four projected points and gets

1

m ,

1

m

′

,

2

m and

2

m

′

, which minimize

E

.

2.2 Restoring the 3D Motion of the

Pointer From 2D Images

Using the concept of a virtual camera, the camera is

set virtually to the opposite side of the mirror from

an actual camera. Images reflected by the mirror can

be calculated as being shot directly by the virtual

camera (see Figure 5). Actually, there is no virtual

camera’s image plane. However, it can be regarded

as overlapping onto the image plane of the actual

camera.

We set

r

C as actual camera coordinates and

v

C as virtual camera coordinates.

Figure 5: The virtual camera is set to the opposite side of

the mirror from the actual camera.

r

C is field symmetrical with

v

C . The relation

between two coordinates is thus set as follows:

tRCC

vr

+

=

(4)

R

expresses rotational movement and t expresses

parallel translation movement to

r

C

from

v

C .

Then

VISAPP 2006 - MOTION, TRACKING AND STEREO VISION

430

T

2nnI

R

−=

⎥

⎥

⎥

⎦

⎤

⎢

⎢

⎢

⎣

⎡

−−−

−−−

−−−

=

2

2

2

2122

2212

2221

zzyzx

xyyyx

yxyxx

nnnnn

nnnnn

nnnnn

(5)

Nt 2=

(6)

Where,

),,(

zyx

nnn=n

is the unit normal vector of

the mirror, thus

||/ NNn =

.

The 3D position of the pointer

P

is calculated

with the two points

)1,,(

yx

mm=m

,

)1,,(

yx

mm

′

′

=

′

m

projected onto the image plane. We can consider

m as the point being shot by the actual camera

r

C

,

and can consider

m

′

as the point of the virtual

camera

v

C .

The two lines extended toward

m and m

′

from

each camera’s focal point should intersect at the

same point

P

in an ideal case. From the formula (4),

this relation is described:

tmRm +

′′

= ss (7)

Where,

ms and m

′′

s are the 3D positions of

P

in

the coordinates

r

C and

v

C respectively. When

m

′′

s is translated to the coordinates

r

C , it becomes

right side of the formula (7). Although

N

is not

given, the normal vector of the mirror

N

λ

is

predetermined at the self-calibration phase. Then,

(7) is changed to

NmRm

λ

+

′′

= ss (8)

Unfortunately,

s and s

′

, which satisfy this

formula, may not be found due to errors included in

N

λ

and camera parameters.

This means that the two lines do not always

intersect. Then we define the ‘‘intersection point’’ as

the centre of the smallest sphere to which both lines

are tangential. Consider the case in which the two

lines are respectively tangential to the sphere at a

node A,

m

t

, and a node B,

NmR

λ

+

′′

t

as illustrated

in Figure 6.

Figure 6: The two lines which tangential to the sphere at a

node A and B.

Here the

t

and

t

′

are scalar.

By defining a unit vector directed from node A

to node B as

d

and the distance between these two

nodes as

u

, the node B can be determined as

NmRdm

λ

+

′

′

=

+

tut (9)

Since the two lines are perpendicular to line AB,

the unit vector

),,(

zyx

ddd

=

d

is denoted as

|||| mRm

mRm

d

′

×

′

×

=

(10)

Therefore, the three remaining unknowns

t

,

t

′

and

u

in (9) can be obtained by solving the matrix

expression of:

1

)21(22

2)21(2

22)21(

2

2

2

−

⎥

⎥

⎥

⎦

⎤

⎢

⎢

⎢

⎣

⎡

′

−−

′

+

′

′

+

′

−−

′

′

+

′

+

′

−−

⎥

⎥

⎥

⎦

⎤

⎢

⎢

⎢

⎣

⎡

=

⎥

⎥

⎥

⎦

⎤

⎢

⎢

⎢

⎣

⎡

′

zzyzyxzxzz

xyyyxyxyy

zyxyyxxxxx

z

y

x

mnmnnmnndm

mnnmnmnndm

mnnmnnmndm

N

N

N

t

u

t

λ

(11)

Finally, the point of intersection, which therefore

is assumed to be the location which is the middle

point of A and B, can be determined as

2

d

mP

u

t +=

(12)

The three scalar

t

,

t

′

and

u

are all include a

remaining unknown

λ

shown as (11). This can be

regarded as a scale factor. We can give free values to

λ

in the virtual space.

3 EXPERIMENTAL RESULTS

We present the inputting experiments with our

proposed system. The internal camera parameters

are as follows:

[resolution of camera image] 640

×480 pixels

[camera focal distance] 3.73

[angle of view] vertical: 31.2°, transverse: 39.6°

MOTION TRACKING WITH REFLECTIONS - 3D pointing device with self-calibrating mirror system

431

Figure 7: The mirror system with a LED pointer.

We use two types of pointers, i.e., a stylus with a

LED at the nib, and the fingertips.

The pointer with the LED uses the properties of

a light emitting tool, which include information such

as color, lighting, darkness, blinking, etc., of the

LED with button operation. Those signals provide

variety of operations such as “a click” and “a double

click”. A clicking is a single blinking and a double

clicking is a continual double blinking within a

second.

The gestures of picking and releasing using two

fingertips can be recognized by the system, and

enable users treat computer-generated objects to be

as in the 3D space.

And the system can be used with a big mirror

fixed on a wall or a ground and with a camera held

by hands, because calibration of the mirror

orientation in the camera coordinates is very easy

and fast (see Figure 8).

Figure 8: A system with a big mirror fixed on a wall and a

hand held camera. (One of the inputting images).

Then, we conduct the input experiment of 3D

curves, surfaces, etc (see Figure 9). The proposed

method can input fine operation at the tip of a

pointer with sufficient balance in all the directions.

Figure 9: Generating a 3D curve and a 3D surface.

The cost of calculation is comparatively low

because only one image obtained from single camera

is processed. This enables real time tracking of

projected points, restoration of 3D motion and

orientation of the mirror by only CPU processing.

3.1 Space Resolution

For 3D pointing devices, the resolution of space in

which we input motions reflects more about system

performance. The sensitivity to relative movement

of pointer operation thus strongly influences the

“feel” of use more than absolute positioning

accuracy.

We then consider “space resolution” as a

criterion showing the ability of how dense the

system samples a space.

Space resolution is defined as the minimum

distance among the 3D operations of a pointer that

the system can recognize.

The resolution in a 2D digital image is, if

expanded onto 3D, expressed as a spatial spread of a

4-sided pyramids as shown in (see Figure 10). The

3D resolution is calculated by an intersection area of

two 4-sided pyramids that are extended from directly

projected image pixels and reflected image pixels.

As long as the centre of the pointer tip moves inside

the 4-sided pyramids, movement does not appear in

the image and it is not recognized by the system.

CCD Camera

Monitor

Mirror

LED Pointer

3D Object

VISAPP 2006 - MOTION, TRACKING AND STEREO VISION

432

Figure 10: A digital image and an extended pyramid area.

For comparison, we introduce two conventional

techniques estimating the 3D position of a pointer

via single directional camera input.

1) SPHERICAL MARKER

A spherical marker is at the tip of the pointer, and

used for estimating the 3D position from the 2D size

of the sphere projected on the image plane.

2) PLURAL MARKERS

Three markers are attached at equal intervals to a

pen-like pointer to restore the 3D position of the

markers from the 2D position where the three

markers are projected to the image plane. The

distances between each marker are given.

The resolution of the direction perpendicular to

the optical axis (

xy

direction of an image plane)

becomes the same as each of the techniques of

spherical and multiple markers. However, the

resolution to the optical axis direction is not the

same as shown in (see Table 1).

Table 1: Averages of space resolutions.

X

(width)

Y

(height)

Z

(depth)

Mirror System 0.13mm 0.13mm 0.31mm

Sphere Marker 0.14mm 0.19mm 0.62mm

Plural Markers 0.13mm 0.20mm 0.48mm

Here, we calculate the average of space

resolutions in the area of hemisphere shown as (see

Figure 11), and setting conditions of the system are

as follows:

Figure 11: The area of sampling hemisphere.

4 CONCLUSIONS

We have proposed a real-time method for restoring

the 3D motion of a pointer by using single

directional video input. Conventionally, to obtain an

object's 3D position from a single view, the shape

and size of an object or plural markers on it had to

be recognized simultaneously. Such a method,

however, makes restoration accuracy low.

We use mirror images of the pointer to enable us

to input fine 3D motion of objects such as a light

emitting pointer and fingers. We constructed a

simple, compact system as a desktop tool, and the

relative locations of the camera and the mirror are

self-calibrated. By processing the pointer images, we

implemented mouse button functions such as

clicking.

The method we have proposed can be

constructed using simple, common components such

as a camera, a mirror. This makes them applicable in

a variety of situations. As shown in (Figure.12), we

consider a single desktop tool with a built-in camera.

Figure 12: A simplified desktop tool.

The principle of the system can be used for

without the stylus tools, recognizing gesture input

with 3-dimensional manual operation shown as

Figure 13.

Figure 13: A gesture of picking and releasing.

Our proposal was implemented using gestures

such as picking or releasing objects with the

fingertip, but more complicated operation is possible

using all of the fingers, e.g., for turning or changing

a computer-processed object manually.

Image Plane

Focal Point

Pyramid Area

Image Pixel

MOTION TRACKING WITH REFLECTIONS - 3D pointing device with self-calibrating mirror system

433

REFERENCES

Kenneth, J., Massie, T., 1994. The PHANToM haptic

interface: A device for probing virtual objects. In

ASME International Mechanical Engineering

Expotion

and Congress Chicago.

Sato, M., Koike, Y., 2000. Process division in haptic

display system. In Proceedings Seventh International

Conference on Parallel and Distributed Systems., pp.

219–224.

Smith, J. R., 1995. Field mice: extracting hand geometry

from electric field measurements. In IBM Systems

Journal, Vol. 35, No. 3&4.

Yonemoto, S., Taniguchi, R., 2002. High-level Human

Figure Action Control for Vision-based Real-time

Interaction, In Proc. Asian Conf. on Computer Vision.

Sugishita, S., Kondo, K., Sato, H., Shimada, S., 1996.

Sketch Interpreter for geometric modelling, In Annals

of Numerical Mathematics 3, pp. 361–372.

Zeleznik, R. C., Herndon, K. P., Hughes, J. F.,1996.

SKETCH: An Interface for Sketching3D Scene, In

Proceedings of ACM SIGGRAPH ’96, pp. 163–170,

ACM Press.

Branco, V., Costa, A., Ferriera, F. N., 1994. Sketching 3D

models with 2D interaction devices, In Proc. of

Eurographics’94, volume 13, pp. 489–502.

Turban, E., 1992. Expert Systems and Applied Artificial

Intelligence, pp. 337-365, Prentice Hall.

Faugeras, O. D., 1993. Three-dimensional computer

vision: A Geometric Viewpoint. MIT Press,

Cambridge, MA.

Xu, G., Zhang, Z., 1996. Epipoolar Geometry in Stereo,

Motion and Object Recognition. Kluwer Academic

Publishers.

Longuet-Higgins, H. C., 1981. A computer algorithm for

reconstructing a scene from two projections. Nature,

293:133-135.

Lane, J., Lalioti, V., 2001. Interactions with reflections in

virtual environments. In Proc. AFRIGRAPH '01: The

1st international conference on Computer graphics,

virtual reality and visualisation, p87-93.

VISAPP 2006 - MOTION, TRACKING AND STEREO VISION

434