optical center ((Barreto et al., 1999), (Basu and Ravi,
1997), (Collins and Tsin, 1999), (Woo and Capson,
2000)). If these assumptions can be suitable for ex-
pensive mechanisms, they are not sufficiently accu-
rate to model the true motion of inexpensive pan-tilt
mechanisms. In reality a single rotation in pan rota-
tion induces a curved movement in the image instead
of straight line.
Last years, (Shih et al., 1998), (Davis and Chen,
2003) and (Jain et al., 2006) proposed a pan-tilt
camera calibration based on a more complex model.
(Shih et al., 1998) gave the details of calibrating a
stereo head with multiple degrees of freedom, how-
ever orthogonally aligned rotational axes were still
supposed. (Davis and Chen, 2003) proposed an im-
proved model of camera pan-tilt motion and virtual
calibration landmarks using a moving light-emitting
diode (LED) more adapted to the dynamic camera
calibration. The 3D positions of the LED were in-
ferred, via stereo triangulation, from multiple station-
ary cameras placed in the environment.
Recently, (Jain et al., 2006) showed that the tech-
nique of Davis and Chen can be improved. Their new
method calibrates more degrees of freedom. As with
other methods you can calibrate the position and ori-
entation of the camera’s axes, but you can also cal-
ibrate the rotation angle. It is more efficient, more
accurate and less computionally expensive than the
method of Davis and Chen. Actually, (Jain et al.,
2006) mean to be the only one to propose a method
without simplistic hypothesis. The step of calibration
involves the presence of a person to deal with the cal-
ibration marks. So, this method can not be used in the
goal of a turnkey solution for a no-expert public.
Now, methods based on the no-direct camera cal-
ibration are focused. Few people explore this ap-
proach. (Zhou et al., 2003) use collocated cameras
whose viewpoints are supposed to be identical. The
procedure consists of collecting a set of pixel location
in the stationary camera where a surveillance subject
could later appear. For each pixel, the dynamic cam-
era is manually moved to center the image on the sub-
ject. The pan and tilt angles are recorded in a LUT in-
dexed by the pixel coordinates in the stationary cam-
era. Intermediate pixels in the stationary camera are
obtained by a linear interpolation. At run time, when
a subject is located in the stationary camera, the cen-
tering maneuver of dynamic camera uses the recorded
LUT. The advantage of this approach is that calibra-
tion target is not used. This method is based on the
3D information of the scene but the LUT is learned
manually.
More recently, (Senior et al., 2005) proposed a
more automatic procedure than (Zhou et al., 2003).
Figure 1: Our system of collocated cameras : the static cam-
era is on the left and the dynamic camera is on the right.
To steering the dynamic camera, they need to know a
sequence of transformations to allow to link a position
with the pan-tilt angles. These transformations are
adapted to pedestrian tracking. An homography links
the foot position of the pedestrian in the static cam-
era with the foot position in the dynamic camera. A
transformation links the foot position in the dynamic
camera with the head position in the dynamic camera.
Finally, another transformation, a LUT as (Zhou et al.,
2003), links the head position in the dynamic cam-
era with pan-tilt angles. These transformations are
learned automatically from unlabelled training data.
The main method default relies on the training data.
If this method is used for a turnkey solution for a no-
expert public and unfortunately the scene changes, it
is impossible that a no-expert public could constitute
a good and complete training data in order to update
the system.
A solution in the continuity of (Zhou et al., 2003)
and (Senior et al., 2005) works is proposed. Indeed,
(Jain et al., 2006) need the depth information of the
object in the scene. So they need to use stereo triangu-
lation. But, like in figure 1, this system is composed
of two almost collocated cameras.
Moreover, for an automatic and autonomous sys-
tem, solutions proposed by (Jain et al., 2006) and (Se-
nior et al., 2005) are not usable. In fact, they need
an expert knowing precisely how to use a calibration
target (Jain et al., 2006) or how to extract the good
informations to make the training data (Senior et al.,
2005).
In this paper, an automatic and autonomous solu-
tion is presented for an uncalibrated pair of cameras.
The solution adapts automatically to its environment.
In fact, if the pair of cameras are in a changing envi-
ronment, this solution can be restarted regularly.
HYBRID DYNAMIC SENSORS CALIBRATION FROM CAMERA-TO-CAMERA MAPPING: AN AUTOMATIC
APPROACH
499