Uncalibrated Image Rectiﬁcation for Coplanar Stereo Cameras

Vinicius Cesar

, Thiago Farias

, Saulo Pessoa

, Samuel Macedo

, Judith Kelner

and Ismael Santos

Centro de Informatica, UFPE, Recife, Brazil

Universidade de Pernambuco, Caruaru, Brazil

Tecgraf, PUC-RIO, Rio de Janeiro, Brazil

Keywords:

Rectiﬁcation, Stereo, Calibration, Reconstruction.

Abstract:

Nowadays, underwater maintenance tasks, mostly in the case of oil and gas industries, have been assisted by

computer vision algorithms. An important part of these procedures is the rectiﬁcation of stereo images, which

is the ﬁrst step in the stereo 3D reconstruction pipeline. Some aspects of the underwater environment make the

rectiﬁcation process difﬁcult: it presents a very noisy scenario; and the equipment is almost textureless. As a

result of this demanding scenario, this article proposes a novel technique for a more accurate rectiﬁcation of a

set of images than the state-of-the-art methods. Tests were carried out proving the efﬁciency of the proposed

technique.

1 INTRODUCTION

Cameras are broadly used by the industry to assist

maintenance tasks, especially when the environment

is unreachable or even harmful for human beings. By

using cameras, an individual can remotely supervise

the operation and take records for future consultation.

Beyond these beneﬁts, the captured footage can also

be used by a computer vision application to provide

additional information about the environment, such

as its 3D structure. This task is usually performed

by synchronized stereo rigs, which always require an

image rectiﬁcation stage in the application pipeline to

reduce processing time and complexity.

The problem tackled by this work occurs in a deep

Figure 1: First row exhibits left and right images captured

by the stereo rig.

underwater environment (depth can exceed 1000m)

where the maintenance task of a ﬂexible pipe is car-

ried out. Because of the high pressure of the environ-

ment, the task is performed by using ROVs (Remotely

Operated Vehicles) equipped with a pair of cameras.

Since no natural light can achieve such a depth, spe-

cial low light cameras are used. Despite these cameras

are sensitive to low light conditions (10

−3

LUX), they

are analog (resolution is limited to the NTSC stan-

dard) and can only capture gray scale images. In ad-

dition, the underwater environment is very noisy with

particles ﬂoating around and the pipe is almost tex-

tureless. All these characteristics lead to a ﬁnal poor

quality image with limited contrast, which makes any

feature extraction almost unfeasible. In order to over-

come this problem, some high contrast markers were

previously painted over the pipe. It consists in inter-

leaved white and black regions along the pipe sur-

face. So, by using these markers a tailored solution

for the pipe segmentation was developed. This tech-

nique can provide a few stable set of features required

by the rectiﬁcation technique. The tests performed so

far have shown that, at least for the presented study

case, the segmentation solution using temporal coher-

ence is robust enough to produce no outliers. How-

ever, it can be extended with RANSAC (Fischler and

Bolles, 1981) in order to ensure robustness in more

critical cases. A sample of the captured images and

the segmentation results can be seen in Fig. 1. It

worth to mention that this painting is required not

648

Cesar V., Farias T., Pessoa S., Macedo S., Kelner J. and Santos I..

Uncalibrated Image Rectiﬁcation for Coplanar Stereo Cameras.

DOI: 10.5220/0004745106480655

In Proceedings of the 9th International Conference on Computer Vision Theory and Applications (VISAPP-2014), pages 648-655

ISBN: 978-989-758-009-3

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

only by the feature extraction process, but by the sub-

sequent tracking stage that will not be approached in

this paper. Then, painting the pipe would be required

even if the rectiﬁcation process would not exist. Pre-

viously calibrating the pair of cameras is something

that would be inconvenient because the cameras are

mounted during the maintenance task and the techni-

cians are not trained for it.

The current methods for rectiﬁcation do not work

properly in a noisy environment with a reduced num-

ber of feature correspondences, especially when the

calibration is unknown. Thus, this paper presents a

novel rectiﬁcation technique for stereo rigs that op-

erates even with a reduced number of feature cor-

respondences. The state of the art techniques (such

as (Fusiello and Irsara, 2008)) need at least six cor-

respondences, while the proposed technique requires

only three. However, the proposed solution requires

the following restrictions on the cameras rig: cam-

eras’ projection planes must be coplanar; and cameras

must have equal intrinsic parameters.

Tests were carried out in order to evaluate the

proposed technique. Three different sets of tests

were applied to measure the error related to the tech-

nique. The ﬁrst one considered a synthetic test that

was proposed only with points numerically disturbed

by a random generated noise. In the second test, a

real structured environment was built and a carefully

mounted stereo rig was used.

The third test occurred in a real scenario. As stated

before, the technique was tested in a deep underwater

environment, where images were captured by a ROV.

2 RELATED WORK

Rectiﬁcation of stereo images is a frequently in-

vestigated topic by the computer vision community.

These researches began by photogrammetrists, such

as (Slama et al., 1980), which were further developed

by computer vision researchers aiming to facilitate the

feature matching between images from a stereo rig.

Rectiﬁcation techniques can be classiﬁed into two

categories: calibrated, and uncalibrated. Calibrated

techniques assume that cameras’ intrinsic and extrin-

sic parameters are known and the rectifying homogra-

phies are estimated only by taking into account these

parameters. In (Fusiello et al., 2000), a simple and

effective calibrated method is presented.

Uncalibrated techniques estimate the rectifying

homographies by using a set of corresponding 2D

points between the images and/or epipolar restrictions

(such as the fundamental matrix). These techniques

are more used than the calibrated ones because in

most of the real problems the rectiﬁcation is required

in a stage before the cameras poses are known. How-

ever, it is a more complex problem with non-linear

solutions, which requires the use of approximations

and optimization methods. Such methods are required

because there are inﬁnite pairs of rectifying homo-

graphies, although it is convenient to choose the one

which produces less image deformation. Some un-

calibrated techniques can be found in (Hartley, 1998),

(Loop and Zhang, 1999), (Isgro and Trucco, 1999),

(Fusiello and Irsara, 2008).

When the epipole is close to or inside the image,

image deformation tends to be large. In these cases

planar rectiﬁcations are not enough, therefore it is

necessary to use different techniques such as cylin-

drical rectiﬁcation (Roy et al., 1997) or polar rectiﬁ-

cation (Pollefeys et al., 1999).

In the scenario presented by this paper only part of

cameras parameters are previously known, which en-

forces the use of an uncalibrated technique. However,

uncalibrated techniques require at least six accurate

corresponding points between the images (Fusiello

and Irsara, 2008), requirement that may not always

be fulﬁlled by the application. The proposed tech-

nique overcomes this limitation by relying on some

restrictions imposed on the stereo rig. In practice,

these constrain the way in which cameras must be

relatively positioned and oriented. If the stereo rig

is mounted so that the cameras’ projection planes are

coplanar, epipoles will be localized close to the inﬁn-

ity, enabling a planar rectiﬁcation to solve the prob-

lem.

3 BACKGROUND

In this section, some concepts that are at the core of

the proposed rectiﬁcation technique will be presented

as well as the adopted notation. These concepts are

more extensively explained in (Hartley and Zisser-

man, 2004), (Loop and Zhang, 1999).

3.1 Epipolar Geometry

Given two pinhole cameras P and P

with their re-

spective projection matrices deﬁned as P = K[I|0] and

= K

R[I|−C]. I is a 3 × 3 identity matrix. Camera

P has its projection center at the origin of the coordi-

nate system 0 = [0, 0, 0]

. Camera P

has its projec-

tion center at C = [x

, y

, z

]

, deﬁned in Euclidean

coordinates. Furthermore, matrices K and K

are the so

called calibration matrices, which encapsulate cam-

eras’ intrinsic parameters. A simpliﬁed calibration

matrix has the form diag( f , f , 1), where f is the lens

UncalibratedImageRectificationforCoplanarStereoCameras

649

focal length. This simpliﬁed form assumes that the

principal point is the center of the image where the

pixel skew is zero and the pixel aspect ratio is one.

The canonical form of the cameras matrices P and

are calculated aplying a projective transformation

to the 3D space such that P = [I|0].

Given a 3D point X in homogeneous coordinates,

its projection in image I through camera P is given

by x = PX. Likewise, x

= P

X is the projection of X

in image I

through the camera P

By using these projected points, the epipolar con-

straint can be established as

Fx = 0, (1)

which is valid for all 2D point correspondences x ↔

. F is a 3 × 3 rank 2 matrix named fundamental ma-

trix, which maps points from image I to lines (named

epipolar lines) in image I

. Given the line l

= Fx

in image I

, it can be said that x

lies on l

since

= 0. The reverse idea is also valid, and therefore

point x lies on the line l = F

. All epipolar lines

of one image intersect each other at a single point

named epipole, where e is the epipole in I and e

the epipole in I

Being P

can

and P

can

two projection matrices from

canonical cameras, i.e. P

can

= [I|0] and P

can

= [M|m], the fundamental matrix between the

two images captured by these cameras can be deﬁned

F = [m]

M, (2)

where [m]

stands for the antisymmetric matrix that

is equivalent to the cross product with m.

Two images

I and

are said to be rectiﬁed if all

matching points

x = [ ¯x, ¯y, 1]

and

= [ ¯x

, ¯y

, 1]

have

the same coordinate in y, i.e. ¯y = ¯y

. Thus, with the

rectiﬁed matching points on the same line, the stereo

matching is made easier and computationally faster.

The rectifying process consists in estimating two

homographies H and H

, which when applied to im-

ages I and I

, respectively, make them rectiﬁed.

The epipolar geometry between two rectiﬁed im-

ages has some noteworthy particularities. The fun-

damental matrix between two rectiﬁed images is

F =

[[1, 0, 0]

]

All the epipolar lines of a rectiﬁed image are par-

allel to the x direction of the image, since

x =

[0, 1, − ¯y]

. Assuming all epipolar lines intersect at

the epipoles, the epipoles are valued [1, 0, 0]

3.2 Loop and Zhang Algorithm

Loop and Zhang in (Loop and Zhang, 1999) present

a rectiﬁcation algorithm that uses the epipolar restric-

tions of the images. This algorithm aims to rectify

images by minimizing the distortion caused by pro-

jective transformations. The algorithm requires the

fundamental matrix and the epipole in the ﬁrst image.

The strategy adopted by the algorithm is to de-

compose the rectifying homographies in three trans-

formation: 1) a projective transformation H

, that

maps the epipoles to the inﬁnity; 2) a similarity trans-

formation H

, that rotates and translates the epipoles

to [1, 0, 0]

; and 3) a shearing transformation H

that

minimizes image distortion in the x coordinates. Us-

ing the notation deﬁned in Section 3.1, the transfor-

mations H and H

, which rectify the images I and I

respectively, are deﬁned as

H = H

(3)

and

= H

. (4)

To compute the projective transformation, two

lines must be deﬁned: w = [w

, w

]

and w

, w

]

. The lines w and w

pass through

epipoles e and e

, respectively. In order to map the

epipoles to inﬁnity, one has to deﬁne projective the

transformations H

and H

that respectively map w

and w

to inﬁnity. Since there are an inﬁnity number

of possible lines, it is preferred to choose the ones that

minimize image distortions. Therefore, the projective

transformations are deﬁned as





1 0 0

0 1 0





. (5)

Similarly we can deﬁne H

After projective transformations, epipolar lines

become parallel one another considering the same im-

age, although they are not aligned considering the

matching lines between the images. The similarity

transformations rotate and translate images in order

to make the epipolar lines parallel to the x direction.

These transformations are





− w

− F

− w

+ v

0 0 1





(6)

and





− F

− w

− F

0 0 1





, (7)

where v

is a common vertical translation for both im-

ages.

The homographies H

and H

are already able

to rectify the images, although shearing transforma-

tions can be added in order to minimize images distor-

tion. This transformation only modify x coordinates,

without affecting the rectiﬁcation. In short, it is sim-

ply an attempt to preserve perpendicularity and aspect

ratio of the images.

VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications

650

4 METHODOLOGY

The following methodology draws its actions from

the constraints aforementioned, where the cameras

have coplanar projection planes and the identical in-

trinsic parameters. Since camera P is at the origin,

its projection matrix can be expressed as P = K[I|0],

where K is the calibration matrix (intrinsic parame-

ters). Matrix K is stated as diag( f , f , 1), where f is

the lens focal length. Camera P

has a translation

along the xy plane and a rotation by θ around its opti-

cal axis. In addition, camera P

has the same intrinsic

calibration of camera P . Thus, the projection matrix

of camera P

is deﬁned as P

= KR[I|−C], where R is

a tridimensional counterclockwise rotation by angle

θ about z axis and C = [x

, y

, 0]

. In order to sim-

plify further calculations, C vector will be presented

as C = d[cosα, sin α, 0], with d = ||C||.

To obtain the canonical form of the camera matri-

ces, we can deﬁne the transformation

can



−1



, (8)

resulting in P

can

= [I|0] and P

can

= P

can

[KRK

−1

| − RC] = [R| − RC]. By using these matrices

and (2) one can calculate the fundamental matrix re-

lated to P and P

, resulting in

F = [−RC]

= −d





0 0 sin(α + θ)

0 0 −cos(α + θ)

−sin α cosα 0





(9)

Once the fundamental matrix is up to scale, factor

−d can then be removed from (9). The epipoles from

F and F

are extracted using the nullspace of these

matrices, giving respectively

e = null(F) = [cotα, 1, 0]

(10)

and

= null(F

) = [cot(α + θ), 1, 0]

(11)

After calculating the epipolar geometry, the rec-

tifying homographies can be found by applying the

Loop and Zhang’s algorithm (Loop and Zhang, 1999).

The ﬁrst step is to deﬁne the projective transfor-

mations that map epipoles to inﬁnity. In order to de-

ﬁne these transformations, one has to determine the

lines w and w

As stated in (10) and (11), epipoles are already at

inﬁnity if cameras are coplanar, so the line at inﬁnity

∞

= [0, 0, 1]

must be chosen in order to avoid image

distortion. Then

w = w

= [0, 0, 1]

, (12)

which, by (5), leads to H

= H

= I.

The following step maps, through a rotation and

translation, the epipoles onto the point [1, 0, 0]

. In

(Loop and Zhang, 1999), the mapping is given by (6)

and (7), which depends on F, w, and w

. Using (9)

and (12) to ﬁll (6) and (7), one can get





cosα sin α 0

−sin α cosα 0

0 0 1





(13)

and





cos(θ + α) sin(θ + α) 0

−sin(θ + α) cos(θ + α) 0

0 0 1





. (14)

The last step of Loop and Zhang’s algorithm deter-

mines afﬁne transformations in order to preserve the

aspect ratio and perpendicularity of image. It is also

worth to mention that H

and H

are rigid trans-

formations, therefore this step is not necessary. So,

one can deﬁne H

= H

= I.

By using the rectifying homographies calculated

as (3) and (4), the proposed method, adapted from

Loop and Zhang’s technique for cameras with copla-

nar projection planes can be summarized as follows.

Given two images I and I

and 2D matching points

x ↔ x

, where x is in I and x

is in I

, the rectiﬁcation

of I and I

consists in calculating the angle α, an-

gle β = α + θ and matches

x ↔

, where

x = R(α)x,

= R(β)x

, and R(θ) is a matrix representing a 2D

clockwise rotation by angle θ.

According Loop and Zhang in (Loop and Zhang,

1999), the rectiﬁcation is done from the fundamental

matrix, although one can deﬁne another approach us-

ing 2D point matches to determine angles α and β.

Given

x =



¯x

¯y





x cos α − ysinα

x sin α + ycosα



(15)

and



¯x

¯y





cosβ − y

sinβ

sinβ + y

cosβ



, (16)

and knowing that the rectiﬁed image obeys constraint

¯y = ¯y

, one can have

x sin α + ycosα − x

sinβ − y

cosβ = 0. (17)

If there are n 2D matching points, there will be n

equations like (17), which leads to the system







−x

−y

−x

−y

−x

−y













sinα

cosα

sinβ

cosβ







= Ay = 0, (18)

where 0 is a column n-vector of zeros.

UncalibratedImageRectificationforCoplanarStereoCameras

651

The solution has two degrees of freedom and can

be reached by using only 2D matches between the two

images. The degrees of freedom are the two angles

that rotate the respective images. The aforementioned

system is non-linear due to sine and cosine functions,

and a linear approximation is not suited for a noisy

scenario.

The solution of the problem given in (18) can be

found by using two different methods that will be de-

scribed in the next subsections.

4.1 Linear Solution

This approach uses a technique called “linearization”

that was employed by (Ansar and Daniilidis, 2003)

and (Lepetit et al., 2009) to estimate pose of cameras

based on correspondences between 2D and 3D points.

This technique modiﬁes the presentation of the prob-

lem to apply a linear approximation that satisﬁes its

non-linear constraints.

In order to make the problem linear, one can

substitute the non-linear part of the problem by new

variables, giving y = [sinα, cos α, sin β, cosβ]

, y

]

. One must ensure that the

Pythagorirean trigonometric identities

+ y

= 1 (19)

and

+ y

= 1 (20)

still hold.

In order to ﬁnd the solution space of (18) one can

solve it by using a SVD decomposition A = UDV

The approximated solution is within the space deﬁned

by the base composed of the third and fourth columns

of V, named u and v, respectively, which are related to

the smallest singular values. Since there are two vari-

ables, the solution space is two-dimensional. Thus,

the solution to y must be a linear combination of the

vectors u = (u

, u

) and v = (v

, v

), re-

sulting

y = γu + δv. (21)

By replacing (21) in (19) and (20), one can have

(γu

+ δv

)

+ (γu

+ δv

)

= 1 (22)

and

(γu

+ δv

)

+ (γu

+ δv

)

= 1, (23)

which represent two ellipses, because ∆

−2(v

− v

)

and ∆

= −2(v

− v

)

are always negative.

Adding up (22) and (23), one can ﬁnd

uγ

+ u

vγδ + v

vδ

+ δ

= 2,

(24)

once u and v were picked from an orthonormal ba-

sis. Conic in (24) describes a circumference whose

equation is satisﬁed by intersection points of ellipses

(22) and (23). In this case, there can be two or four

intersections.

The intersections of the circumference with the

two ellipses can be found by solving a fourth order

polynomial such as ax

+ bx

+ c = 0, which has two

symmetric solutions. Such polynomial can be deter-

mined by using the Sylvester resultant. The result can

be used in (24) to determine the last variable. The

symmetric solutions are realistic because the supple-

ment of an answer is also a correct answer, once im-

ages remain rectiﬁed when rotated by 180

◦

. In the

case where there are four solutions, the one that satis-

ﬁes γ > δ must be used.

4.2 Non-linear Solution

The system given by (18) is non-linear, and thus not

suitable for a direct solution. Therefore, another ap-

proach to tackle the problem is by using numerical

methods. Finding α and β that minimize ||Ay||

is a

least squares problem that can be solved by the Gauss-

Newton method. The initial values of α and β, namely

and β

, can be assigned in two different ways: ei-

ther by using the result of the linear method described

in this paper, or by taking both values as zero. The

second choice is acceptable because the cameras are

mounted on ROVs manually attempting to enforce the

coplanar constraints.

5 RESULTS

In an attempt to evaluate the proposed technique, this

section proposes three sets of tests, which respectively

compare the approaches proposed one another with

synthetic data, compare the best approach with the

state-of-the-art technique described by Fusiello and

Irsara (Fusiello and Irsara, 2008), and show the epipo-

lar error in a real cluttered environment.

5.1 Synthetic Simulation

The simulation proposes to compare the different ap-

proaches of the suggested method to solve the homo-

geneous system given by (18) minimizing ||Ay||. A

synthetic scene was created with two cameras, P and

, with coplanar projection planes. The intrinsic cal-

ibration of the cameras were generated considering an

image size of 800 × 600 pixels and a focal length of

965.68 pixels (a horizontal ﬁeld of view around 45

◦

VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications

652

Table 1: Synthetic simulation results with 2 pixels of Gaussian noise (Mean±std and Iterations).

6 points 12 points 20 points

α-Error β-Error Itr α-Error β-Error Itr α-Error β-Error Itr

Linear 2.81 ± 2.91 1.71 ± 1.83 – 1.88 ± 1.91 1.16 ± 1.17 – 1.45 ± 1.49 0.90 ± 0.91 –

Non-Linear 1.16 ± 2.00 0.62 ± 1.21 5.64 0.68 ± 0.56 0.37 ± 0.32 5.14 0.50 ± 0.41 0.28 ± 0.24 4.96

Linear+Non-Linear 1.11 ± 1.14 0.59 ± 0.64 4.29 0.68 ± 0.56 0.37 ± 0.32 3.83 0.50 ± 0.41 0.28 ± 0.24 3.59

Figure 2: Synthetic results varying the amount of point correspondences and Gaussian noise.

In this simulation the cameras are 90cm apart from

each other and a 3D point cloud was randomly gen-

erated inside the frustums of the cameras and away

between 3m and 6m from their baseline. These num-

bers represent the expected conﬁguration of the real

environment where the method will be applied. The

3D points are projected by the cameras P e P

and

thereafter a Gaussian noise will be added to the pro-

jections to simulate the lack of precision of the track-

ing process. These projected points are the input for

the rectiﬁcation method proposed in this work. The

non-linear technique will be evaluated against both

initializing α

and β

with zero values and with the

solution of the linear approach.

The results were obtained using 6, 12 and 20 3D

points. The Gaussian noise applied to the projections

has standard deviation from 0.5 to 4 pixels. For each

noise (standard deviation) generated, the tests were

computed 2000 times in order to reach an accurate

evaluation. In each sample, the values used for α and

θ came from a uniform distribution generating values

between −30

◦

and 30

◦

. The results are illustrated in

Fig. 2. Table 1 presents the numeric results.

Note that the precision is strongly related to the

number of points. It is also possible to observe

that the linear algorithm achieved the worst perfor-

mance with errors greater than 4

◦

, while the non-

linear achieved averaging errors below 2

◦

When the result of the linear algorithm feeds the

initial values of the non-linear algorithm, which uses

an iterative Gauss-Newton algorithm, one can observe

that the results are very similar to the ones where the

initial values are taken from zero (α

= 0 and β

= 0).

The difference between both approaches does not ex-

ceed 0.1

◦

. This occurs because the local minima do

not inﬂuence the convergence of the algorithm, except

when the amount of points is small.

In Table 1, one can verify that initializing the

Gauss-Newton algorithm with the linear approach de-

creases between 20% and 40% the number of itera-

tions. It is possible to observe as well that the conver-

gences in the non-linear approaches are different only

when it has six points.

Overall, one can conclude that an optimal solution

to the problem can be reached applying the Gauss-

Newton method, unless the amount of points is small

(around 6). In this case, by using as initial value the

output of the linear approach can produce more accu-

rate results.

All simulations were performed using MATLAB.

The hardware used in the tests was a computer with

an Intel Core i7 3960X 3.30Ghz processor and 24GB

RAM. The execution time was collected using a sim-

ulation with 20 3D points. In average, the linear ap-

proach takes around 0.15ms to ﬁnish, the non-linear

0.3ms, and the linear+non-linear also 0.3ms. The rea-

son why the two last approaches spend the same time

is because the non-linear approach performs fewer it-

erations when initialized by the linear approach result.

5.2 Controlled Environment Tests

In this test, a controlled environment is used to com-

pare the proposed approach to the state-of-the-art

technique of Fusiello and Irsara (Fusiello and Irsara,

2008). The test consists of two pictures of a chess-

board, as illustrated in Fig. 3. The camera used in

this test was a Canon T4i and the picture resolution

was set to 720 × 480 pixels. Only this resolution was

tested because it is closest one to the resolution of the

cameras attached to the ROV. The camera was posi-

tioned about 70cm away from the target. Between the

two shots, the camera was moved 20cm rightward and

4cm upward keeping the same focal length and en-

UncalibratedImageRectificationforCoplanarStereoCameras

653

Figure 3: Images used for the controlled environment tests.

forcing coplanarity constraint of cameras. The cam-

era was intentionally moved without an accurate pro-

cess (actually it was manually moved), while keeping

the optical axes nearly parallel. Also, the images were

manually defocused to simulate the blur phenomenon

that occurs underwater.

Since the exact position of camera is unknown,

there is no ground truth. Thus, the technique will be

evaluated using the rectiﬁcation error (i.e. epipolar er-

ror, or the distance of the point to the related epipolar

line, which can be calculated as Ay). The chessboard

has 54 points that can be extracted and matched be-

tween the images. To apply the rectiﬁcation, a sub-

set ranging from 6 to 20 points chosen randomly will

be used as input, although all 54 points are used to

measure the epipolar error. It is expected that the

more precise the rectiﬁcation the smaller will be the

errors. For each amount of points, 100 subsets of ran-

dom points were chosen to be rectiﬁed with the non-

linear approach proposed (initialized with the linear

approach) and later with the technique proposed by

Fusiello and Irsara (Fusiello and Irsara, 2008).

In Fig. 4, it can be seen that the Fusiello and Ir-

sara’s technique had poor results with a small amount

of points, since such technique has more degrees of

freedom to be determined. However, from 12 points

on, the Fusiello and Irsara’s technique can estimate

more precisely all the system variables than the pro-

posed technique.

In Fig. 5 the rectiﬁcation of the left image using

6 points is shown. The rectiﬁcation from Fusiello and

Irsara applied more distortion to the images and has

a strong projective distortion, as well. Such result is

not acceptable as the epipoles are close to the inﬁnity.

As expected, the proposed technique applied a simple

rotation.

The tests were carried out using the same hard-

ware from the synthetic simulation. The proposed

technique, due to its complex minimization calcula-

tions, had an execution time between 0.2 and 0.3ms.

Fusiello and Irsara’s technique had an average time of

230ms with 6 points and 3s with 20 points.

5.3 Real Experiments

This work was also tested with real underwater im-

Figure 4: Comparison of the results obtained with Fig. 3 by

the proposed technique and the Fusiello and Irsara’s tech-

nique.

(a) (b)

Figure 5: Rectiﬁcation of the left image of Fig. 3 using (a)

the proposed technique and (b) Fusiello and Irsara’s tech-

nique.

ages. The cameras used for the operations were

Kongsberg OE15-100c (low light cameras for high

depths). These cameras are attached to the ROV by a

metal support and are 45cm apart. The support of the

cameras is an attempt to keep the cameras’ projection

planes parallel. The system ﬁrst segments the ﬂexible

pipe from both left and right images in order to ex-

tract features. The segmentation is performed in two

stages: ﬁrst, a tailored thresholding technique is used

to ﬁnd out which regions are potentially of the pipe

(the white blobs in the second row of Fig. 1); second,

a search is performed in order to discover the best se-

quence of thresholded regions which describe a pipe.

Since the features are extracted by evaluating the cen-

troid of the thresholded regions, features position are

not very precise. These features are then matched be-

tween both images. The amount of extracted features

ranges from 5 to 16. Even without a groundtruth, the

achieved results can testify the technique efﬁciency.

Fusiello and Irsara’s technique was also tested

with the real underwater images. However, it fails in

many cases because the estimated homographies pro-

duce huge projective distortions, while only a rotation

and a small projective transformation are needed (as

illustrated in Fig. 6(a)). In addition, consecutive pairs

of images (i.e., similar images) produced completely

different homographies, which is mostly due to inac-

curacies in the detection of features position. In Fig.

VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications

654

(a)

(b)

Figure 6: Rectiﬁcation of the images of Fig. 1 using (a)

Fusiello and Irsara’s and (b) the proposed technique.

6(a) the red circle shows that this part of the image is

not correctly rectiﬁed.

The proposed technique achieved better results

than the Fusiello and Irsara’s technique. The results

along multiple frames were also more stable. The ro-

tation angle in the ﬁrst image was between 9

◦

and 13

◦

while the second image was between 10

◦

and 14

◦

The epipolar error ranges from 0.5 to 1.2 pixels. Fig.

6(b) illustrate the result. The red circle shows that, in

contrast with the Fusiello and Irsara’s technique, this

part of the image is correctly rectiﬁed.

After the initial rectiﬁcation, the images were

aligned allowing the extraction of more information

about the visual landmarks in order to perform a more

accurate 3D reconstruction. Thus, the error embedded

in the rectiﬁcation process is acceptable for the whole

system. Outliers were not detected in the tests, how-

ever they could be removed using RANSAC-based al-

gorithms.

6 CONCLUSIONS

This paper proposed a novel rectifying technique

for images under constraints imposed by underwater

maintenance tasks. The proposed technique takes ad-

vantage of the geometry of the structure of the stereo

rig, which is positioned keeping the cameras’ pro-

jection planes coplanar. This arrangement represents

lesser degrees of freedom for the rectiﬁcation prob-

lem, which allows lesser point correspondences to ob-

tain satisfactory accuracy, as well.

Tests were carried out using synthetic data, a real

controlled environment and a real underwater scene.

The proposed technique performed better than the

state-of-the-art method (Fusiello and Irsara, 2008).

As future work, the present technique will be im-

proved to take into consideration variations on the in-

trinsic parameters of the cameras, such as focal length

and principal point, which were considered to be ﬁxed

under the performed tests.

REFERENCES

Ansar, A. and Daniilidis, K. (2003). Linear pose estimation

from points or lines. IEEE Transactions on Pattern

Analysis and Machine Intelligence, 25:282–296.

Fischler, M. A. and Bolles, R. C. (1981). Random sample

consensus: a paradigm for model ﬁtting with appli-

cations to image analysis and automated cartography.

Commun. ACM, 24(6):381–395.

Fusiello, A. and Irsara, L. (2008). Quasi-euclidean uncal-

ibrated epipolar rectiﬁcation. In Pattern Recognition,

2008. ICPR 2008. 19th International Conference on,

pages 1–4.

Fusiello, A., Trucco, E., and Verri, A. (2000). A compact al-

gorithm for rectiﬁcation of stereo pairs. Mach. Vision

Appl., 12(1):16–22.

Hartley, R. I. (1998). Theory and practice of projective rec-

tiﬁcation.

Hartley, R. I. and Zisserman, A. (2004). Multiple View Ge-

ometry in Computer Vision. Cambridge University

Press, ISBN: 0521540518, second edition.

Isgro, F. and Trucco, E. (1999). Projective rectiﬁcation

without epipolar geometry. In Computer Vision and

Pattern Recognition, 1999. IEEE Computer Society

Conference on., volume 1, pages –99 Vol. 1.

Lepetit, V., Moreno-Noguer, F., and Fua, P. (2009). Epnp:

An accurate o(n) solution to the pnp problem. Int. J.

Comput. Vision, 81(2):155–166.

Loop, C. and Zhang, Z. (1999). Computing rectify-

ing homographies for stereo vision. In Computer

Vision and Pattern Recognition, 1999. IEEE Com-

puter Society Conference on., volume 1, pages 2 vol.

(xxiii+637+663).

Pollefeys, M., Koch, R., and Van Gool, L. (1999). A sim-

ple and efﬁcient rectiﬁcation method for general mo-

tion. In Computer Vision, 1999. The Proceedings of

the Seventh IEEE International Conference on, vol-

ume 1, pages 496 –501 vol.1.

Roy, S., Meunier, J., and Cox, I. (1997). Cylindrical recti-

ﬁcation to minimize epipolar distortion. In Computer

Vision and Pattern Recognition, 1997. Proceedings.,

1997 IEEE Computer Society Conference on, pages

393 –399.

Slama, C. C., Theurer, C., and Henriksen, S. W., editors

(1980). Manual of Photogrammetry. American Soci-

ety of Photogrammetry.

UncalibratedImageRectificationforCoplanarStereoCameras

655