EXPERIMENTING WITH AUTONOMOUS CALIBRATION

OF A CAMERA RIG ON A VISION SENSOR NETWORK

Kyung min Han, Yuanqiang Dong and Guilherme DeSouza

Department of Electrical and Computer Engineering, University of Missouri

349 Engineering Building West, Columbia, U.S.A.

Keywords:

Vision sensor network, Image clustering, Global coordinate reference.

Abstract:

This paper presents a completely autonomous camera calibration framework for a vision sensor network con-

sisting of a large number of arbitrarily arranged cameras. In the proposed framework, a sequence of images for

calibration is collected without a tedious human intervention. Next, the system automatically extracts all nec-

essary features from the images and ﬁnds the best set of images that minimizes the error in 3D reconstruction

considering all cameras in the set.

1 INTRODUCTION

Calibration of multiple camera systems became an

important topic along with vision sensor networks

(VSN) such as for: virtual and augmented reality;

surveillance; battle ﬁeld reconnaissance; etc. (Re-

magnino and Jones, 2002; Jaynes, 1999; Koller et al.,

1997). In order to calibrate a VSN several criti-

cal steps must be taken: 1) acquiring images syn-

chronously; 2) extracting feature points from the im-

ages; 3) establishing the correspondence among the

feature points in multiple images from multiple cam-

eras; 4) performing individual camera calibrations;

and 5) computing a global coordinate reference for

all cameras. Currently, a few of these steps still re-

quire a number of tedious and manual subtasks such

as: selecting good images from which feature points

can be extracted for calibration; manually establish-

ing the correspondences between feature points from

different cameras; etc. These tasks become quite chal-

lenging and time consuming especially when the VSN

has a large number of cameras. Moreover, a great hu-

man involvement in the calibration process can induce

errors that could lead to a poor overall accuracy of the

system. Therefore, it is quite desirable that vision sen-

sor networks can be autonomously calibrated.

In (Huang and Boufama, 2002), for example, a

semi-automatic calibration system was developed for

augmented reality. However, the method presented

still requires that the user clicks on four points per im-

age in order to construct homography matrices. Be-

sides the tedious requirement of clicking on a large

number of points, the user is also required to be very

careful when performing this task. Otherwise, the ac-

curacy of the calibration will degenerate with every

point wrongly selected.

Another common approach is to resort to some

special marker, such as a laser pointer (Svoboda et al.,

2005) or a LED stick (Baker and Aloimonos, 2000).

One of the main drawbacks of these kinds of systems

is in the quite large number of images that must be

obtained in order to cover a reasonable small space –

since only one or two feature points can be obtained

from each image.

In (Olsen and Hoover, 2001), a system to cali-

brate cameras in a hallway was proposed using several

square tiles. Similar to the landmarks in (Koller et al.,

1997), their method not only requires that several tiles

be carefully positioned, but also that the area covered

by the tiles spans the ﬁeld of view of all cameras in

the hallway.

A pattern-free approach was proposed in (Chen

et al., 2005), where the trajectory of a bouncing ball

is used for calibration. Yet their method is tested only

using computer simulation and it is unclear whether

their algorithm can perform at all in a real situation.

Another pattern-free approach is the system described

in (Yamazoe et al., 2006). In that case, a geometry

constraint is used to extract feature points from a hu-

man silhouette. However, their method requires a tra-

ditional pre-calibration step in order to estimate the

fundamental matrices used for the ﬁnal calibration.

In this paper, we present a completely autonomous

framework that performs optimal multi-camera cali-

234

min Han K., Dong Y. and DeSouza G. (2010).

EXPERIMENTING WITH AUTONOMOUS CALIBRATION OF A CAMERA RIG ON A VISION SENSOR NETWORK.

In Proceedings of the 7th International Conference on Informatics in Control, Automation and Robotics, pages 234-237

DOI: 10.5220/0002924802340237

 SciTePress

(a) original image (b) detected lines

Figure 1: Steps of the detection algorithm in (Han and DeS-

ouza, 2007).

bration in terms of the ﬁnal error in 3D reconstruc-

tion for any given subset of the cameras. In order to

achieve that, the system only requires that a sequence

of images be captured by the cameras while a hu-

man presents a calibration pattern at arbitrary poses

in front of the cameras. Then the proposed algorithm

automatically: searches for feature points on the pat-

tern that will be used for calibration; selects the best

set of images that optimizes the overall accuracy from

calibration; and computes the individual camera cal-

ibrations as well as the best sequence of transforma-

tions from the camera to a global coordinate frame.

2 THE PROPOSED ALGORITHM

As we mentioned earlier, the proposed framework for

the calibration of multiple cameras in a vision sensor

network consists of several steps. In this paper we

will focus only on steps 2, 3 and 5 above.

Figures 1(a)-(d) show the various steps of our fea-

ture detection algorithm developed in (Han and DeS-

ouza, 2007). As illustrated by these ﬁgures, a small

number of spurious lines and consequently spurious

feature points are initially detected due to noise in the

image. However, after a few more steps into the de-

tection algorithm, all spurious feature points are elim-

inated, as depicted in Figure 1(d). Finally, given the

shape of the pattern, the algorithm automatically es-

tablishes an order to each corner point. This ordering

system is later used for feature correspondence.

The existence of noise in the images greatly af-

fects the performance of the feature detection algo-

rithm. Hence, it is necessary to evaluate these images,

assign a score to them, and choose only those that can

lead to the best calibration. The algorithm initially as-

signs a uniform score of 100 to every image acquired.

Then, algorithm starts to deduct a penalty whenever

the image fails to satisfy a certain expectation. To

rank the image, the penalty is formulated as:

Score

= 100 − 100 ×

∑

i=1

+ s

× | N − n |

= max (stdv

) + max(stdv

)

(

stdv

+stdv

)

(1)

Where stdv

and stdv

are the standard deviations

in, respectively, the u and v coordinates of detected

corners; N is the total number of corners in the pattern

and n is the number of corners detected. The rationale

behind this scoring scheme is to assign a penalty that

is proportional to the uncertainty (stdv) in the detected

feature point.

Once an image rank is created using the above

scores, the algorithm must start selecting images for

calibration. We cluster the images according to two

non-exclusive criteria: orientation (straight-centered,

tilted-forward, tilted-backward, tilted-to-the-left and

tilted-to-the-right) of the pattern and its distance

(near, medium and far) to the camera. The cluster-

ing of the images is performed by a K-means algo-

rithm using the the angles of the edges and the pat-

tern’s apparent size. Once the clusters are formed,

the algorithm selects for each camera calibration ﬁve

images according to the rank Score

. That is, one im-

age from each of the ﬁve orientations is selected from

the medium clusters. Next, two more images from the

near and far clusters are selected based solely on their

ranks – that is, these images can come from any of the

ﬁve orientation clusters. Finally, two extra images of

the pattern are selected for every pair of cameras that

share a view of the pattern at that pose. Once the im-

ages are chosen, the calibration is performed using a

popular method found in the literature (Zhang, 2000).

The ﬁnal step of the algorithm is the problem of

ﬁnding the best transformation from the coordinate

frame of any camera i to any camera j. However, not

all possible paths between cameras assure the same

accuracy in 3D reconstruction. Due to the quality

of the image used for calibration, some paths may

lead to better accuracy than others. We approached

this problem using a graph search algorithm which

is the same as the problem of ﬁnding all-pairs short-

est paths. We ﬁrst compute the shortest-path where

weights are scored as described above. Next we ap-

ply the Floyd-Warshall Algorithm (Floyd, 1962) to

ﬁnd the best path and therefore the best transforma-

tion between camera coordinate systems.

EXPERIMENTING WITH AUTONOMOUS CALIBRATION OF A CAMERA RIG ON A VISION SENSOR

NETWORK

235

(a) no radial distortion (b) with radial distortion

Figure 2: Errors of 3d reconstruction vs. levels of noise.

Blue lines denote the errors using the best images and red

lines denote the errors using poorly ranked images.

3 EXPERIMENTAL RESULTS

We tested our proposed algorithm for two different

situations. The ﬁrst group of tests was done with syn-

thetic data and we used from 6 to 42 virtual cameras.

The second group of tests was done with real data us-

ing 6 cameras.

Using the intrinsic parameters from real cameras,

we initially created a virtual space with 18 cameras –

all pointing to the center of the space. We set the ori-

gin of this space at the center, so that the positions of

all 18 cameras could be easily determined. These ar-

bitrary intrinsic and extrinsic parameters of the cam-

eras will later be referred to as our ground truth. Next,

two thousand positions of the synthetic pattern were

randomly generated and noise plus camera radial dis-

tortion were added at various levels to simulate the

effects of real data. One last position of the pattern

was created separately from the training set for test-

ing purpose. The above procedure was repeated 10

times and the results were averaged over all trials.

The amount of noise and distortion added var-

ied through the entire experiment. However, even

when the amount of noise – the standard deviation

of a white (Gaussian) noise – is 2 pixels, the algo-

rithm still performs very well, with less than 1cm of

error in 3D reconstruction. Also, in order to con-

trast the algorithm with a bad scenario in which the

images for calibration are not appropriately selected,

we compared the performance of the calibration us-

ing images that scored poorly. Figures 2(a) and (b)

show the error in 3D reconstruction as a function of

the noise. The error is calculated as the difference

between the estimated (reconstructed) coordinates of

the test points and the ground truth. Figure 2(a) shows

the simulation performed without adding radial dis-

tortion, while for 2(b), a typical radial distortion (from

the real lenses) of 0.3 was added.

We also tested our algorithm with real data. Us-

ing the calibration parameters obtained using the pro-

posed algorithm and the pixel coordinates of a set of

predeﬁned points in space as perceived by all 6 cam-

(a) synthetic sphere (b) real sphere

Figure 3: 3D Reconstructions.

eras, we reconstructed the spatial coordinates of these

points and compared the calculated values with the

real ones.

The calibration error was measured by averaging

the results for 20 different snapshots while present-

ing the reference points to all cameras. The reference

points were exactly 50cm apart. Each snapshot was

taken by all 6 cameras, so a total of 120 images were

used for this test. The accuracy of the ﬁnal calibra-

tion was determined by calculating the distance be-

tween two reference points. The accuracy in 3D re-

construction was 50.6264cm – or less than 1.5% of the

actual distance. Also, the small standard deviation

(0.2498cm) shows that the calibration obtained with

our algorithm gives a very consistent 3D reconstruc-

tion. It is important to mention that, for the current

baseline of the cameras, a deviation of a single pixel

in the location of the marks on the ruler already incurs

on an error of almost 3mm in the reconstruction.

4 3D OBJECT MODELING

Since the main application for our camera rig is for

3D object modeling, we tested the accuracy of the cal-

ibration by reconstructing a sphere based on the idea

of visual hull (Laurentini, 1994) and the human body

using an algorithm for multi-view 3D modeling pre-

sented in (Lam et al., 2009). As before, the tests were

conducted for both synthetic data and real data.

In the ﬁrst case, synthetic data, we created multi-

ples of 6 virtual cameras (6, 12, 18, . . .) arbitrarily po-

sitioned around the object. By utilizing intrinsic and

distortion parameters, we synthesized the images of

a sphere (ball) with 20cm of radius. Using the voxel

carving approach (Dyer, 2001), we reconstructed the

sphere and measured its radius. In simple terms, this

approach consists of: 1) deﬁning a set of voxels in

the 3D space; 2) marking each voxel according to the

occupancy of the object as projected onto each of the

ICINCO 2010 - 7th International Conference on Informatics in Control, Automation and Robotics

236

Table 1: Estimated radii vs. the number of virtual cameras.

# of virtual cameras 6 12 18 24 30 36 42

estimated radius (cm) 21.53 21.35 21.00 20.43 20.25 20.15 20.15

6n image planes; and 3) ﬁnding the intersection of the

occupancy for all 6n cameras. Table 1 shows the re-

lationship between the reconstructed radii versus the

number of cameras used for reconstruction. As ex-

pected, the error decreases as the number of cameras

increases. That is because, among other reasons, the

occupancy deﬁned by each camera view forms a cone

in space and the intersection of any subset of camera

views approximates the sphere by the surface of such

cones. Every time a new camera is added to the sub-

set, the approximation becomes closer to the actual

shape of the sphere. Since this procedure also relies

on a sphere-ﬁtting algorithm to circumscribe the oc-

cupied voxels, the detected radius tends always to be

larger than the actual radius. For the real data, six

cameras were used to take images of a ball. For each

image, a circular Hough transform was used to detect

the boundary and the 2D radius of the ball. As before,

we relied on a voxel carving approach to reconstruct

the ball. Figures 3(a) and (b) depict the reconstructed

sphere for both synthetic data and real data. For the

real ball, also with 20cm of radius, a 3D sphere was

ﬁtted and the radius was estimated. The performance

of the framework using real cameras was 24.3cm. Fi-

nally, Figure 3(c) depicts the result from our multi-

view algorithm for 3D modeling.

5 CONCLUSIONS

We have presented a novel method for autonomous

camera calibration of a multi-camera rig. The exper-

imental results showed that the algorithm is vital in

order to obtain good 3D reconstruction. That is, the

algorithm’s selection of the best images for calibra-

tion leads to an improved calibration of as much as ten

times of that obtained without using the algorithm. Fi-

nally, an application for the camera rig was presented

where a sphere was placed in the middle of the rig

and a 3D representation of the same sphere was con-

structed with an error in reconstruction (real camera)

approaching the theoretical (synthetic cameras) error.

REFERENCES

Baker, A. and Aloimonos, Y. (2000). Complete calibra-

tion of a multi-camera network. In Proceedings of

IEEE International Workshop on Omnidirectional Vi-

sion. IEEE Computer Society.

Chen, K., Hung, Y., and Chen, Y. (2005). Calibrating a cam-

era network using parabolic trajectories of a bouncing

ball. In Proceedings of IEEE International Workshop

on VS-PETS. IEEE Computer Society.

Dyer, C. (2001). Volumetric scene reconstruction from mul-

tiple views. In Foundations of Image Understanding.

Kluwer.

Floyd, R. (1962). Algorithm 97: Shortest path. Commun.

ACM, 5(6):345.

Han, K. and DeSouza, G. N. (2007). A feature detection

algorithm for autonomous camera calibration. In Pro-

ceedings of Fourth International Conference on Infor-

matics in Control, Automation and Robotics.

Huang, Z. and Boufama, B. (2002). A semi-automatic cam-

era calibration method for augmented reality. In Pro-

ceedings of IEEE International Conference on System,

Man and Cybernetics. IEEE Computer Society.

Jaynes, C. (1999). Multi-view calibration from planar mo-

tion for video surveillance. In Proceedings of IEEE

International Workshop on Visual Surveillance. IEEE

Computer Society.

Koller, D., Klinker, G., Rose, E., Breen, D., Whitaker, R.,

and Tuceryan, M. (1997). Automated camera cali-

bration and 3d egomotion estimation for augmented

reality applications. In Proceedings of the 7th Inter-

national Conference on Computer Analysis of Images

and Patterns. Springer-Verlag.

Lam, D., Hong, R., and DeSouza, G. (2009). 3d hu-

man modeling using virtual multi-view stereopsis

from on-the-ﬂy motion estimation. In Proceedings

of IEEE/RSJ International Conference on Intelligent

Robots and Systems. IEEE Computer Society.

Laurentini, A. (1994). The visual hull concept for

silhouette-based image understanding. IEEE Trans-

actions on Pattern Analysis and Machine Intelligence.

Olsen, B. and Hoover, A. (2001). Calibrating a camera net-

work using a domino grid. Pattern Recognition, 34.

Remagnino, P. and Jones, G. (2002). Registration of surveil-

lance data for multi-camera. In IEEE International

Conference on Information Fusion. IEEE Computer

Society.

Svoboda, T., Martinec, D., and Pajdla, T. (2005). A conve-

nient multicamera self-calibration for virtual environ-

ments. MIT Press, Cambridge.

Yamazoe, H., Utsumi, A., and Abe, S. (2006). Multiple

camera calibration with bundled optimization using

silhouette geometry constraints. In Proceedings of

the 18th International Conference on Pattern Recog-

nition. IEEE Computer Society.

Zhang, Z. (2000). A ﬂexible new technique for camera cal-

ibration. IEEE Transactions on Pattern Analysis and

Machine Intelligence.

EXPERIMENTING WITH AUTONOMOUS CALIBRATION OF A CAMERA RIG ON A VISION SENSOR

NETWORK

237