A SINGLE PAN AND TILT CAMERA ARCHITECTURE FOR

INDOOR POSITIONING AND TRACKING

T. Gaspar and P. Oliveira

IST/ISR, Lisboa, Portugal

Keywords:

Indoor positioning and tracking systems, Camera calibration, GVF snakes, Multiple-model adaptive estima-

tion, Single camera vision systems.

Abstract:

A new architecture for indoor positioning and tracking is proposed, based on a single low cost pan and tilt

camera, where three main modules can be identiﬁed: one related to the interface with the camera, supported

on parameter estimation techniques; other, responsible for isolating and identifying the target, based on ad-

vanced image processing techniques, and a third, that resorting to nonlinear dynamic system suboptimal state

estimation techniques, performs the tracking of the target and estimates its position, and linear and angular

velocities. To assess the performance of the proposed methods and this new architecture, a software package

was developed. An accuracy of 20cm was obtained in a series of indoor experimental tests, for a range of

operation of up to ten meter, under realistic real time conditions.

1 INTRODUCTION

With the development and the widespread use of

robotic systems, localization and tracking have be-

come fundamental issues that must be addressed in

order to provide autonomous capabilities to a robot.

The availability of reliable estimates is essential to

its navigation and control systems, which justiﬁes the

signiﬁcant effort that has been put into this domain,

see (Kolodziej and Hjelm, 2006), (Bar-Shalom et al.,

2001) and (Borenstein et al., 1996).

In outdoor applications, the NAVSTAR Global

Positioning System (GPS) has been widely explored

with satisfactory results for most of the actual needs.

Indoor positioning systems based on this technology

however face some undesirable effects, like multipath

and strong attenuation of the electromagnetic waves,

precluding their use.

Alternative techniques, such as infrared radiation,

ultrasounds, radio frequency, vision has been success-

fully exploited as reported in detail in (Kolodziej and

Hjelm, 2006), and summarized in (Gaspar, 2008).

The indoor tracking system proposed in this

project uses vision technology, since this technique

has a growing domain of applicability and allows to

achieve acceptable results with very low investment.

This system estimates in real time the position, ve-

locity, and acceleration of a target that evolves in an

unknown trajectory, in the 3D world, as well as its an-

gular velocity. In order to accomplish this purpose, a

new positioning and tracking architecture is detailed,

based on suboptimal stochastic multiple-model adap-

tive estimation techniques.

The complete process of synthesis, analysis, im-

plementation, and validation in real time exceeds the

objectives of this paper, due to space limitations. The

reader interested can found these issues discussed in

detail in (Gaspar, 2008).

This document is organized as follows. In sec-

tion 2 the architecture of the developed positioning

and tracking system is introduced, as well as the main

methodologies and algorithms developed. In section

3 the camera and lens models are brieﬂy introduced.

To isolate and identify the target, advanced image pro-

cessing algorithms are discussed in section 4, and in

section 5, the used multiple-model nonlinear estima-

tion technique is introduced. In the last two sections,

6 and 7, experimental results of the developed sys-

tem, and concluding remarks and comments on future

work, respectively, are presented.

2 SYSTEM ARCHITECTURE

In this project a new architecture for indoor position-

ing and tracking is proposed, based on three main

523

Gaspar T. and Oliveira P. (2009).

A SINGLE PAN AND TILT CAMERA ARCHITECTURE FOR INDOOR POSITIONING AND TRACKING.

In Proceedings of the Fourth International Conference on Computer Vision Theory and Applications, pages 523-530

DOI: 10.5220/0001803105230530

 SciTePress

Figure 1: Tracking system architecture.

modules: one that addresses the interface with the

camera, the second that implements the image pro-

cessing algorithms, and a third responsible for dy-

namic systems state estimation. The proposed archi-

tecture is presented in Fig. 1, and is described next

The extraction of physical information from an

image acquired by a camera, requires the knowledge

of its intrinsic (A) and extrinsic (R and T) parame-

ters, which are computed during the initial calibration

process. In this paper, calibration was preceded by

an independent determination of a set of parameters

(K) responsible for compensating the distortion intro-

duced by the lens of the camera. Since the low cost

camera used has no orientation sensor, the knowledge

of its position in each moment requires the develop-

ment of an external algorithm capable of estimate its

instantaneous pan α

and tilt θ

angles.

The target identiﬁcation is the main purpose of the

image processing block. An active contour method,

usually denominated as snakes, was selected to track

the important features in the image. The approach se-

lected consists of estimating the target contour, pro-

viding the necessary information to compute its cen-

In this section some quantities are presented informally

to augment the legibility of the whole document.

ter coordinates (u,v) and its distance (d) to the origin

of the world reference frame. These quantities corre-

spond to the measurements that are used to estimate

the position (

x), velocity (

v), and acceleration (

a) of

the body to be tracked. Note that the computation of d

requires the knowledge of the real dimensions of the

target, since the proposed system uses a single camera

instead of a stereo conﬁguration.

To obtain estimates on the state and parame-

ters of the underlying dynamic system, an estima-

tion problem is formulated and solved. However, the

dynamic model adopted and the sensor used, have

nonlinear characteristics. Extended Kalman ﬁlters

were included in a multiple-model adaptive estima-

tion methodology, that provides estimates on the sys-

tem state (

v, and

a), identiﬁes the unknown target

angular velocity bw, and the estimation error covari-

ance P, as depicted in Fig. 1.

The command for the camera is the result of solv-

ing a decision problem, with the purpose of maintain-

ing the target close to the image center. Since the

range of movements available is restricted, the imple-

mented decision system is very simple and consists

in computing the pan and tilt angles (α

and θ

), that

should be sent to the camera at each moment. Large

distances between the referred centers are avoided,

thus the capability of the overall system to track the

targets is increased.

3 SENSOR: PTZ CAMERA

3.1 Camera Model

Given the high complexity of the camera optical sys-

tem, and the consequent high number of parameters

required to model the whole image acquisition pro-

cess, it is common to exploit a linear model to the

camera. In this architecture it was considered the clas-

sical pinhole model (Faugeras and Luong, 2001).

Let M = [x,y,z,t]

be the homogeneous coordi-

nates of a visible point, in the world reference frame,

and m = [u,v,s]

the corresponding homogeneousco-

ordinates of the same point in the image frame. Ac-

cording to this model, the relation between the coor-

dinates expressed in these two coordinate frames is

given by

λm = PM, (1)

where λ is a multiplicative constant, related with the

distance from the point in space to the camera, and

P the projection matrix that relates 3D world coor-

dinates and 2D image coordinates. The transforma-

tion given by this matrix can be decomposed into

VISAPP 2009 - International Conference on Computer Vision Theory and Applications

524

three others: one between world and camera coordi-

nate frames, expressed by

in homogeneous co-

ordinates; other responsible for projecting 3D points

into the image plane, represented by π, and a third

one that changes the origin and units of the coordinate

system used to identify each point in the acquired im-

ages, denoted as A. The product of the three previous

transformations results in the overall expression for

the matrix P, which is given by P = A.π.

, and es-

tablishes the relation between a point in the 3D world

and its correspondent in the acquired images.

The use of the previous model implies the deter-

mination of the intrinsic and extrinsic parameters re-

ferred before. In this work, the classical approach pro-

posed by Faugeras (Faugeras and Luong, 2001) was

selected and implemented. The disadvantages of this

method are: i) the required preparation of the scene in

which the camera is inserted, and ii) the distortion of

the lens is disregarded. However, the impact of these

requirements is moderate since the camera in this ap-

plication is supposed to be placed in a ﬁxed location

in the world (the calibration needs to be performed

just once). A separate algorithm that compensates for

lens distortion is implemented, see section 3.3 for de-

tails. The major advantages are that only one image

is required and reliable results can be obtained.

The classical method proposed by Faugeras

consists in performing an initial estimation of the

projection matrix, that is done from a set of points

with known coordinates in world and camera ref-

erence frames. Writing (1) and reorganizing the

expression obtained to every one of the n points used

in the calibration process, and considering that the

index i identiﬁes the coordinates of the i

used point,

yields, for each point,



1 0 0 0 0 −u

−u

0 0 0 0 x

1 −v

−v



.p = 0,

with

p =





where p

is the P element whose line and column are

j and k, respectively.

The previous equations, when applied to the

entire set of used points, lead to a system of the

form Lp = 0, where L is a 2n × 12 matrix. The

solution of this system corresponds to the eigenvector

associated with the smallest eigenvalue of L

L, or,

equivalently, to the singular vector of L associated

with the smallest singular value of its Single Value

Decomposition. Since the projection matrix has 12

elements, and each point considered contributes with

two equations, there is a minimum of 6 points that

must be used in the calibration process. The intrinsic

and extrinsic parameters of the camera can then be

computed from the estimated p vector as

= p

, v

= p

|α

| = ||p

− u

||, |α

| = ||p

− v

||,

= p

, r

−v

−u

, t

= p

−u

, t

−v

where p





, and p

represents

the internal product of the vectors p

and p

, see

(Faugeras and Luong, 2001) and (Gaspar, 2008) for

details.

3.2 PTZ Camera Internal Geometry

The camera used in this project has the ability to de-

scribe pan and tilt movements, which makes possi-

ble the variation over time of its extrinsic parame-

ters. Thus, the rigorous deﬁnition of the rigid body

transformation between camera and world reference

frames implies the adoption of a model to the camera

internal geometry and the study of its direct kinemat-

ics.

Since the used Creative WebCam Live! Motion

camera has a closed architecture, its internal geome-

try model was estimated from the analysis of its ex-

ternal structure and based on a small number of ex-

periments.

The proposed model considers ﬁve transforma-

tions, that include the pan, tilt, and roll angles be-

tween the world and camera reference frames; the off-

set between the origin of the world reference frame

and the camera rotation center, and the offset between

the camera rotation and optical centers.

The composition of this transformations leads to

the global transformation between world and camera

reference frames:

−1

that is fundamental to determine the camera projec-

tion matrix over time.

The expressions introduced require, however, the

knowledge of ﬁve parameters: pan, tilt and roll an-

gles, the position of the camera optical center in the

world coordinate frame, when these angles are zero,

and the offset between this point and the camera ro-

tation center. Since there is no position sensor in

the camera, its orientation must be determined in real

time using reference points in the 3D world. The po-

sition of the camera optical and rotation centers, when

the pan and tilt angles are zero, can be performed on

an initial stage resorting to points of the world with

known coordinates.

A SINGLE PAN AND TILT CAMERA ARCHITECTURE FOR INDOOR POSITIONING AND TRACKING

525

3.3 Lens Distortion

The mapping function of the pinhole camera between

the 3D world and the 2D camera image is linear, when

expressed in homogeneous coordinates. However, if a

low-cost or wide-angle lens system is used, the linear

pinhole camera model fails. In those cases, and for the

camera used in this work, the radial lens distortion is

the main source of errors and no vestige of tangential

distortion was identiﬁed. Therefore, it is necessary to

compensate this distortion by a nonlinear inverse ra-

dial distortion function, which corrects measurements

in the 2D camera image to those that would have been

obtained with an ideal linear pinhole camera model.

The inverse radial distortion function is a map-

ping that recoversthe coordinates (x, y) of undistorted

points from the coordinates (x

) of the correspon-

dent distorted points, where both coordinates are re-

lated to a reference frame with origin in image dis-

tortion center (x

). Since radial deformation in-

creases with the distance to the distortion center, the

inverse radial distortion function f(r

) can be ap-

proximated and parameterized by a Taylor expansion

(Thormahlen et al., 2003), that results in

x = x

+ x

∞

∑

i=0

i−1

and y = y

+ y

∞

∑

i=0

i−1

where

+ y

The lens distortion compensation method adopted

in this project is independent of the calibration pro-

cess responsible for determining the pinhole model

parameters, and is based on the rationale that straight

lines in the 3D space must remain straight lines in 2D

camera images. Ideally, if acquired images were not

affected by distortion, 3D world straight lines would

be preserved in 2D images. Hence, the inverse radial

distortion model parameters estimation was based on

the resolution of the following set of equations











= (y

− by

))

= 0

= (y

− by

))

= 0

i = 1,..., N

with

) = m

+ b

where N

and N

are the number of straight lines and

points per straight line acquired from the distorted im-

age, respectively. A set of N

∗N

nonlinear equations

results, its solution can be found resorting to the New-

ton’s method, and estimates for the parameters k

, k

, y

, m

, b

, i = 1,... , N

are obtained. vﬁll

4 IMAGE PROCESSING

4.1 Target Isolation and Identiﬁcation

The isolation and identiﬁcation of the target to be

tracked in each acquired image is proposed to be tack-

led resorting to an active contours method. Active

contours (Kass et al., 1987), or snakes, are curves de-

ﬁned within an image domain that can move under the

inﬂuence of internal forces coming from within the

curve itself and external forces computed from the im-

age data. The internal and external forces are deﬁned

so that the snake will conform to an object boundary

or other desired features within an image. Snakes are

widely used in several computer vision domains, such

as edge detection (Kass et al., 1987), image segmen-

tation (Leymarie and Levine, 1993), shape modeling

(Terzopoulos and Fleischer, 1988), (McInerney and

Terzopoulos, 1995), or motion tracking (Leymarie

and Levine, 1993), as happens in this application.

In this project a parametric active contour method

is used (Kass et al., 1987), in which a parameterized

curve x(s) = [x(s),y(s)], s ∈ [0,1], evolves over time

towards the desired image features, usually edges, at-

tracted by external forces given by the negative gra-

dient of a potential function. The evolution occurs in

order to minimize the energy of the snake

= E

int

+ E

ext

that, as can be seen, includes a term related to its inter-

nal energy E

int

, which has to do with its smoothness,

and a term of external energy E

ext

, based on forces ex-

tracted from the image. Traditionally, this energy can

be expressed in the form

[α|x

′

(s)|

+ β|x

′′

(s)|

] + E

ext

(x(s))ds,

(2)

where the parameters α and β control the snake ten-

sion and rigidity, respectively, and x

′

(s) and x

′′

(s) de-

note the ﬁrst and second derivatives of x(s) with re-

spect to s.

Approximating the solution of the variational for-

mulation (2) by the spacial ﬁnite differences method,

with step h, yields

)

i+1

− 2x

+ x

i−1

) −

i+2

− 4x

i+1

+ 6x

− 4x

i−1

+ x

i−2

) + F

(p)

ext

where x

= x(ih,t), and F

(p)

ext

) represents the image

inﬂuence at the point x

The temporal evolution of the active contour in the

image domain occurs according to the expression

n+1

= x

+ τx

VISAPP 2009 - International Conference on Computer Vision Theory and Applications

526

where τ is the considered temporal step. The iterative

process ends when the coordinates of each point of

the snake remain approximately constant over time.

4.2 Sensor Measurements

Once obtained the target contour, it is possible to

compute the measurements that will be provided to

the estimation process: the target center coordinates

(u,v), and its distance (d) to the origin of world refer-

ence frame.

Target center coordinates in each acquired image

are computed easily as being the mean of the coordi-

nates of the points that belong to the target contour.

Target distance to the origin of world reference frame

is computed from its estimated boundary. Its real di-

mensions in the 3D world, and the knowledge of the

camera intrinsic and extrinsic parameters, allows to

establish metric relations between image and world

quantities. Estimates on the depth of the target can

then be obtained. A complete stochastic characteriza-

tion can be found in (Gaspar, 2008) and will be the

measurements considered as inputs to the estimation

method used.

The use of triangulation methods for at least two

cameras, would allow the computation of the tar-

get distance without further knowledge on the target.

However, the present tracking system uses a single

camera. Thus, additional information must be avail-

able. In this work, it is assumed that the target dimen-

sions are known.

5 TRACKING SYSTEM

In this section, the implemented nonlinear estimation

methods is described. Estimates on the target posi-

tion, velocity and acceleration, in the 3D world, are

provided and angular velocity is identiﬁed. This esti-

mator is based on measurements from the previously

computed target center coordinates and distance to the

origin of world reference frame.

5.1 Extended Kalman Kilter

The Kalman ﬁlter (Gelb, 2001) provides an optimal

solution to the problem of estimating the state of

a discrete time process that is described by a lin-

ear stochastic difference equation. However, this ap-

proach is nod valid when the process and/or the mea-

surements are nonlinear. One of the most successful

approaches, in these situations, consists in applying

a linear time-varying Kalman ﬁlter to a system that

results from the linearization of the original nonlin-

ear one, along the estimates. This kind of ﬁlters are

usually referred to as Extended Kalman ﬁlters (EKF)

(Gelb, 2001), and have the advantage of being com-

putationally efﬁcient, which is essential in real time

applications.

Consider a nonlinear system with state x ∈ ℜ

ex-

pressed by the nonlinear stochastic difference equa-

tion

= f(x

k−1

and with measurements available z ∈ ℜ

given by

= h(x

where the index k represents time, u

the control

input, and w

∈ ℜ

and v

∈ ℜ

are random vari-

ables that correspond to the process and measurement

noise, respectively. These variables are assumed to

be independent, i.e. E[w

] = 0, and with Gaus-

sian probability density functions with zero mean and

covariance matrices Q

and R

, respectively.

In the case of linear dynamic systems, the esti-

mates provided by the Kalman ﬁlter are optimal, in

the sense that the mean square estimation error is min-

imized. Estimates computed by EKF are suboptimal.

It is even possible that it does not converge to the sys-

tem state in some situations. However, the good per-

formance observed in many practical applications, re-

vealed this strategy as the most successful and popular

in nonlinear estimation.

The implementation of an EKF requires a math-

ematical model to the target and sensors used. The

choice of appropriate models is extremely important

since it improves signiﬁcantly the target tracking sys-

tem performance, reducing the effects of the limited

observation data available in this kind of applications.

Given the movements expected for the targets to be

tracked, the 3D Planar Constant-Turn Model as pre-

sented in (Li and Jilkov, 2003), was selected. This

model considers the vector x = [x,

x,y,

y,z,

as the state of the target, where [x,y,z], [

z], and

[

z] are the target position, velocity, and accelera-

tion in the world, respectively.

The sensor measurements available in each time

instant correspond to the target center coordinates

(u,v) and target distance (d) to the origin of world

reference frame, and are given by

u =

x+ p

y+ p

z+ p

x+ p

y+ p

z+ p

+ v

v =

x+ p

y+ p

z+ p

x+ p

y+ p

z+ p

+ v

d =

+ y

+ z

+ v

where p

is the projection matrix element in the line i

and column j, and v = [v

]

is the measurement

A SINGLE PAN AND TILT CAMERA ARCHITECTURE FOR INDOOR POSITIONING AND TRACKING

527

noise (the time step subscript k was omitted for sim-

plicity of notation). The measurement vector is given

by z = [u,v,d]

Next, a standard notation is used (see (Gelb, 2001)

for details) to describe each Kalman ﬁlter:

Predict step

−

= f(

k−1

,0)

−

= A

k−1

+ W

k−1

Update step

= P

−

+ V

)

−1

−

+ K

− h(

−

,0))

= (I− K

−

where K

is the Kalman ﬁlter gain.

The complete measurement process characteriza-

tion requires also the deﬁnition of the measurement

noise covariance matrix R. This matrix can be ob-

tained from an accurate study of the available sensors,

which, in this project, consisted in executing a set of

experiments aiming to compute the standard devia-

tion of the estimation error in the image coordinates

of a 3D world point, and the standard deviation of the

error in target depth estimation.

5.2 Multiple-model

The model considered for the target requires the

knowledge of its angular velocity. However, this

value is not known in real applications, which led

us to the application of a multiple model based ap-

proach, identifying simultaneously some parameters

of the system and estimating its state.

The implemented method, known as Multiple-

Model Adaptive Estimation (MMAE) (Athans and

Chang, 1976), considers several models to a system

that differ in a parameter set (in this case the target

angular velocity). Each one of these models includes

an extended Kalman ﬁlter, whose state estimates are

mixed properly. The individual estimates are com-

bined using a weighted sum with the a posteriori hy-

pothesis probabilities of each model as weighting fac-

tors, leading to the state estimate

∑

j=1

with covariance matrix

∑

j=1

+ (

−

)(

−

)

where p

corresponds to the a posteriori probability

of the model j, at the time instante k, and N to the

number of considered models.

It should be stressed that the methods used to com-

pute the a posteriori probabilities of each model and

the ﬁnal state estimate are optimal if each one of the

individual estimates is optimal. However, this is not

the case in this application, since the known solu-

tions to nonlinear estimation problems at present do

not provide optimal results.

6 EXPERIMENTAL RESULTS

In this section some brief considerations about the

developed positioning and tracking system are ad-

vanced, and the experimental results of its application

to real time situations are presented.

6.1 Application Description

The architecture for positioning and tracking pro-

posed in this project was implemented in Matlab, and

can be divided into three main modules: one that ad-

dresses the interface with the camera, other that im-

plements the image processing algorithms, and a third

related to the estimation process.

Interface with the Camera. Since the camera used

in this project has a discrete and limited range of

movements, its orientation in each time instant is de-

termined according to a decision system whose aim is

to avoid that the distance between the image and the

target centers exceed certain values.

The CCD sensor built-in the camera acquires im-

ages with a maximum dimension of 640× 480 pixels,

which is the resolution chosen for this applica-

tion. Despite its higher computational requirements,

smaller targets can be tracked with an increase on the

accuracy of the system.

Image Processing. The active contour method was

implemented with the values of α and β equal to 0.5

and 0.05, respectively, since these values were the

ones that led to better results.

The developed application is optimized to follow

red targets, whose identiﬁcation in acquired images is

easy, since image segmentation is itself a very com-

plex domain, and does not correspond to the main fo-

cus of this work.

Estimation Process. The adopted MMAE approach

was based on the utilization of four initially equiprob-

able target models, that differ on target angular veloc-

ity values: 2π

[0,1,2,3]rad/s.

Each one of the models requires the knowledge

of the power spectral density matrix of the process

noise, that is not available. After some preliminary

tuning, the matrix considered for this quantity was

set to diag[0.1,0.1,0.1].

VISAPP 2009 - International Conference on Computer Vision Theory and Applications

528

The sampling interval of the developed applica-

tion was made variable, however for the parameters

previously discussed, a lower bound of approximately

0.5s was found.

6.2 Application Performance

The results presented in this section are relative to the

tracking of a red balloon attached to a robot Pioneer

P3-DX, as depicted in Fig. 2, programmedto describe

a circular trajectory.

Figure 2: Real time target tracking. Left: Experimental

setup; Right: Target identiﬁcation, where the initial snake

is presented in black, its temporal evolution is presented in

red, and the contour ﬁnal estimate is presented in blue.

In Fig. 3, the 3D nominal and estimated target tra-

jectories are presented. The target position, velocity

and acceleration along time are depicted in Fig. 4.

Despite the signiﬁcant initial uncertainty in the state

estimate, the target position, velocity, and accelera-

tion estimates converge to the vicinity of the real val-

ues. Moreover, given the suboptimal nature of the re-

sults produced by the extended Kalman ﬁlter in non-

linear applications, in some experimental cases where

an excessively poor initial state estimate was tested,

divergence of the ﬁlter occurred.

1000

2000

3000

4000

5000

−2000

−1000

1000

−1500

−1000

−500

500

Real

MMAE

Figure 3: 3D position estimate of a real target. The real

position of the target in the initial instant is presented in

black.

The position, velocity, and acceleration estima-

tion errors are presented in Fig. 5. These quantities

0 20 40 60 80 100 120 140

−3000

−2000

−1000

1000

2000

3000

4000

5000

X [mm]

t [s]

0 20 40 60 80 100 120 140

−200

−150

−100

−50

100

150

200

250

300

v [mm/s]

t [s]

0 20 40 60 80 100 120 140

−80

−60

−40

−20

a [mm/s

]

t [s]

Figure 4: Position (top left pan), velocity (top right pan),

and acceleration (bottom pan) estimates of a real target in

the world. The slender and tickler lines correspond to the

estimated and real values, respectively.

have large transients in the beginning of the experi-

ment, due to the initial state estimation error, and de-

crease quickly to values beneath 20cm, 4cm/s, and

0.5cm/s

, respectively. There are several reasons that

can justify the errors observed: i) the uncertainty as-

sociated with the characterization of the real trajec-

tory described by the target, and ii) possible mis-

matches between the models considered for the cam-

era and target, and iii) incorrect measurement and sen-

sor noise characterization.

0 20 40 60 80 100 120 140

100

200

300

400

500

600

700

800

900

1000

| [mm]

t [s]

0 20 40 60 80 100 120 140

100

150

| [mm/s]

t [s]

0 20 40 60 80 100 120 140

| [mm/s

]

t [s]

Figure 5: Position (top left pan), velocity (top right pan),

and acceleration (bottom pan) estimation error of a real tar-

get in the world.

The results of the adopted MMAE approach are

presented in Fig. 6. For the trajectory reported, the

real target angular velocity is 2π0.0217rad/s. Thus,

the probability associated to the model closer to the

real target tends to 1 along the experiment, as depicted

A SINGLE PAN AND TILT CAMERA ARCHITECTURE FOR INDOOR POSITIONING AND TRACKING

529

on the left panel of Fig. 6. On the right panel of that

ﬁgure, the real and estimated angular velocities are

plotted.

0 20 40 60 80 100 120 140

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

probabilidades

t [s]

0 20 40 60 80 100 120 140

0.01

0.02

0.03

0.04

0.05

0.06

w [º/s]

t [s]

w estimado

w real

4 modelos

Figure 6: MMAE evolution over time. On the left, the a

posteriori hypothesis probabilities. On the right, real (red)

and estimated (blue) target angular velocity.

In what concerns the range of operation for the

proposed system, it depends signiﬁcantly on the cam-

era used and on the size of the target to be tracked. In

the experiments reported, an elliptic shape with axes

of length 106mm and 145mm, was identiﬁed and lo-

cated, with the mentioned accuracies for distances up

to approximately 7m from the camera. The lower

bound of the range of distances in which the appli-

cation works properly, is limited by the distance at

which the target stops being completely visible, ﬁlling

the camera ﬁeld of vision. For the target considered,

this occurs at distances bellow 40cm.

7 CONCLUSIONS AND FUTURE

WORK

A new architecture for indoor positioning and track-

ing is presented, supported on suboptimal stochastic

multiple-model adaptive estimation techniques. The

proposed approach was implemented using a single

low cost pan and tilt camera, estimating the real time

location of a target which moves in the 3D real world

with accuracies on the order of 20cm.

The main limitations of the implemented system

are the required knowledge on the target dimensions,

and the inability to identify targets with colors other

than red.

In the near future, an implementation of the de-

veloped architecture in C will be pursued, which will

allow for the tracking of more unpredictable targets.

Also, an extension of the proposed architecture to a

multiple camera based system is thought. Distances

to targets will then be computed resorting to triangu-

lation methods, thus avoiding the requirement on the

precise knowledge of their dimensions.

Finally, it is also advised the integration of a sen-

sor in the vision system that retrieves camera orienta-

tion in each time instant, and the implementation of

an image segmentation algorithm that can identify a

wider variety of targets.

ACKNOWLEDGEMENTS

This work was partially supported by Fundac¸˜ao

para a Ciˆencia e a Tecnologia (ISR/IST plurian-

ual funding) through the POS Conhecimento Pro-

gram that includes FEDER funds and by the project

PDCT/MAR/55609/2004 - RUMOS of the FCT.

REFERENCES

Athans, M. and Chang, C. (1976). Adaptive Estimation

and Parameter Identiﬁcation using Multiple Model

Estimation Algorithm. MIT Lincoln Lab., Lexington,

Mass.

Bar-Shalom, Y., Rong-Li, X., and Kirubarajan, T. (2001).

Estimation with Applications to Tracking and Naviga-

tion: Theory Algorithms and Software. John Wiley &

Sons, Inc.

Borenstein, J., Everett, H. R., and Feng, L. (1996). Where

am I? Sensors and Methods for Mobile Robot Posi-

tioning. Editado e compilado por J. Borenstein.

Faugeras, O. and Luong, Q. (2001). The geometry of multi-

ples images. MIT Press.

Gaspar, T. (2008). Sistemas de seguimento para aplicac¸˜oes

no interior. Master’s thesis, Instituto Superior

T´ecnico.

Gelb, A. (2001). Applied Optimal Estimation. MIT Press,

Cambridge, Massachusetts.

Kass, M., Witkin, A., and Terzopoulos, D. (1987). Snakes:

Active contour models. Int. J. Comput. Vis., 1:321–

331.

Kolodziej, K. and Hjelm, J. (2006). Local Positioning Sys-

tems: LBS Applications and Services. CRC Press.

Leymarie, F. and Levine, M. D. (1993). Tracking de-

formable objects in the plane using an active contour

model. IEEE Trans. Pattern Anal. Machine Intell.,

15:617–634.

Li, X. R. and Jilkov, V. P. (2003). Survey of maneuvering

target tracking. part i: Dynamic models. IEEE Trans-

actions on Aerospace and Electronic Systems, pages

1333–1364.

McInerney, T. and Terzopoulos, D. (1995). A dynamic ﬁnite

element surface model for segmentation and tracking

in multidimensional medical images with application

to cardiac 4d image analysis. Comput. Med. Imag.

Graph, 9:69–83.

Terzopoulos, D. and Fleischer, K. (1988). Deformable mod-

els. Vis. Comput., 4:306–331.

Thormahlen, T., Broszio, H., and Wassermann, I. (2003).

Robust line-based calibration of lens distortion from a

single view. Mirage 2003, pages 105–112.

VISAPP 2009 - International Conference on Computer Vision Theory and Applications

530