Reference Plane based Fisheye Stereo Epipolar Rectification

Nobuyuki Kita and Yasuyo Kita

Intelligent Systems Research Institute, National Institute of Advanced Industrial Science and Technology (AIST),

Tsukuba Central 1, Tsukuba 305-8560, Japan

{n.kita, y.kita}@aist.go.jp

Keywords: Fisheye Stereo, Epipolar Rectification, Stereo Measurement, Humanoid Robot, Multi Contact, Near Support

Plane.

Abstract: When a humanoid robot walks through or performs a task in a very narrow space, it sometimes touches the

environment with its hand or arm to retain its balance. To do this the robot must identify a flat surface of

appropriate size with which it can make sufficient contact; the surface must also be within reach of robot's

upper body. Using fisheye stereo vision, it is possible to obtain image information for a field of view wider

than that of a hemisphere whose central axis is the optical axes; thus, three dimensional distances to the

possible contact spaces can be evaluated at a glance. To realize it, stereo correspondence is crucial.

However, the short distance between the stereo cameras and the target space causes differences in the

apparent shapes of the targets in the left and right images, which can make stereo correspondence difficult.

Therefore, we propose a novel method which rectifies stereo images so that the targets have the same

apparent shapes in the left and right images when the targets are close to a reference plane. Actual fisheye

stereo image pairs were rectified, and three dimensional measurements were performed. Better results were

obtained using the proposed rectification method than using other rectification methods.

1 INTRODUCTION

Humanoid robots are expected to substitute or

support the work of humans in many places such as

at disaster sites, airplane assembly plants, and

building sites. As such, humanoid robots must be

able to traverse narrow spaces and conduct tasks that

require their upper body parts, e.g., hands, elbows,

and shoulders, in addition to the soles of their feet to

maintain their balance (Sentis, 2010), (Escande,

2013), (Henze, 2016). To realize such stabilization

motions in unknown environments, a planar area

which has proper sizes and poses have to be

identified in the vicinity of the humanoid robot's

upper body (Brossette, 2013), (Khatib, 2014). The

measurements that are necessary to adequately

evaluate the environment are as follows.

1. Dense three dimensional (3D) distance

measurements.

2. 3D distance measurements in reach of the robot.

3. 3D distance measurements of the immediate

vicinity of the humanoid robot's upper body.

4. Fast 3D distance measurements.

5. 3D distance measurements of poorly textured

surfaces.

Various types of equipment have been developed

for 3D distance measurements. The most popular is

the RGB-D sensor (where the D represents the

"depth" channel); this sensor can perform

measurements 1, 4, and 5. Measurement 3 can be

achieved by controlling the pose of a sensor.

However, there are no off-the-shelf RGB-D sensors

that can perform measurement 2. Most teams that

signed-up for the DRC (Defense Advanced Research

Projects Agency Robotics Challenge) in 2015 used a

spinning LIDAR sensor, which is high-speed

rotational 1D scanning type LRF (Laser rangefinder).

This type of sensor can conduct measurements 1, 3,

4, and 5. However, the closest distance at which

LRFs work is 0.5 m, which is too far to enable

measurement 2 to be conducted. For a long time

now, stereo vision that uses multiple cameras has

been used for 3D distance measurements. There are

numerous studies on stereo measurements and many

available products. However, the width of the space

these products can measure is limited because they

usually use cameras with normal fields of view; thus,

additional pose control equipment is necessary to

carry out measurement 3.

308

Kita N. and Kita Y.

Reference Plane based Fisheye Stereo Epipolar Rectiﬁcation.

DOI: 10.5220/0006261003080320

In Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2017), pages 308-320

ISBN: 978-989-758-227-1

Figure 1: Humanoid robot with fisheye stereo.

Some fisheye lenses have wide fields of view

(larger than 180°). Mounting two cameras equipped

with such fisheye lenses in parallel on the chest of a

humanoid robot yields a stereo field of view greater

than 180° with an optical axis that is in the forward

direction, as shown in Figure 1. For stereo

measurements, stereo correspondence is crucial.

However, the distance from the stereo cameras to the

target space is short compared with the baseline

length of the stereo system; this causes differences

in the apparent shapes of the targets in the left and

right images. Furthermore, some targets have no

clear visual features, which makes stereo

correspondence difficult. In this paper, we propose a

novel method that rectifies stereo images so that the

targets have the same apparent shapes in the left and

right images when the targets reside close to a

reference plane.

In Section 2 of this paper, we introduce related

stereo vision studies and background information

related to the proposed method is presented. In

Section 3, the proposed rectification method, which

is based on a reference plane, is explained. In

Section 4, we present the rectification of actual

fisheye stereo image pairs using the proposed

method by changing the pose of the reference plane

while three dimensional measurements are

performed using a simple region-based matching

method. The experimental results show that using

the proposed rectification method yields better

results than using other rectification methods when

the reference plane is close to the actual target.

Finally, in Section 5 we summarize our work and

discuss potential future research ideas.

2 RELATED WORKS

In the above section, the necessary measurements to

sufficiently evaluate the environment in the

immediate vicinity of a robot were detailed. Based

on those, there are some conditions that make it

difficult to conduct stereo measurements.

D1. The targets are too close (i.e., the distance from

the robot is between one to several times that

of the stereo baseline length).

D2. The texture of targets is sometimes poor.

D3. The target may lie on an extension of the stereo

baseline.

When the target is close to the stereo cameras, its

appearance differs greatly between the left and the

right image. Similar problems occur even for wide

baseline stereo systems, and many methods have

been proposed to tackle these problems (Schmid,

1997), (Baumberg, 2000), (Matas, 2004), (Bay,

2005). Most such methods first detect salient

features and then derive descriptions that are

invariant of the viewing direction. Then the

correspondences are taken based on the measure of

similarity of the descriptions. The existing methods

are not suitable for a target that has no clear visual

features.

For a poorly textured target, most methods derive

a local description from the intensity changes in the

local region and then take correspondences based on

the similarity measure of those descriptions

(Scharstein, 2002). Several important approaches

have been proposed for such region-based methods.

 Epipolar constraint

 Rectification

 Cost function

If the stereo parameters are known, the search region

can be constrained on an epipolar line to reduce the

risk of mismatching (Hartley, 2003). Further, the left

and right images are often transformed so that the

epipolar lines coincide with the image rows via

rectification (Ayache, 1988), (Courtney, 1992),

(Loop, 1999), (Hartley, 1999). Various cost

functions have been developed to determine a

measure of similarity for local regions (Hirschmuller,

2009). The simplest ones directly utilize the local

regions' intensities, e.g., the sum of squared

difference (SSD) and normalized SSD (NSSD)

functions. Some functions compare descriptors that

are derived from the images filtered using Sobel

operator, Gauss operator, or other operators (Zabih,

1994), (Geiger, 2010). Complicated functions have

primarily been developed to cope with differences in

the brightness between the left and right images

(Hirschmuller, 2008). Few methods have been

developed to handle differences in the apparent

shape of objects between the left and right images

(Devernay, 1994), (Tola, 2010).

Reference Plane based Fisheye Stereo Epipolar Rectiﬁcation

309

One stereo vision method measures the ragged

shape of a ground surface using a fisheye stereo

system (Kita, 2011). First a rectification method is

used that transforms only the ground portion of the

images so that the disparity becomes zero on the

expected ground plane. Because the differences in

the apparent shapes in the left and right images

become small around the ground plane,

correspondences can be obtained using simple cost

functions even when the texture is poor. This

method seems to solve the difficulties D1 and D2.

But unfortunately, the rectification method used in

that method does not work when the measurement

target lies on an extension of the stereo baseline.

Pollefeys used a polar coordinate system to

rectify two images that were obtained before and

after the camera was moved forward to include two

epipoles (Pollefeys, 1999). Abraham also used a

polar coordinate system to rectify stereo images that

were obtained by two parallel cameras mounted with

fisheye lenses (Abraham, 2005). Difficulty D3 can

be overcome by using a polar coordinate system, but

both methods mentioned above were not conceived

to cope with the differences in apparent shape

between the left and right images.

In this paper, we therefore propose a new

rectification method that combines the approach of

achieving zero disparity on a reference plane and the

approach of using a polar coordinate system. The

advantages of the proposed method are as follows.

 It enables a reference plane to be set on an

extension of the stereo baseline.

 Because the differences of the apparent shapes

on the left and right images become small for a

target that is close to the reference plane,

correspondences can be obtained using simple

cost functions, even when the texture is poor.

3 REFERENCE PLANE BASED

RECTIFICATION METHOD

The proposed method rectifies only a portion of the

fisheye images so that the disparity becomes zero on

a reference plane. Because something on the

reference plane shows the same apparent shape in

the left and right rectified images, the

correspondence can be detected using a simple

region-based matching method with high reliability.

Though the reference plane was set manually in the

experiments presented in Section 4, in practice it

would be set by the humanoid robot at an area that a

part of its upper body, e.g., hand, may come in

contact with to help it maintain its balance. In the

Figure 2: Coordinate frames.

Figure 3: Virtual camera coordinate frames.

latter part of this section, some coordinate frames are

first defined, the method for deciding which part of

the fisheye images should be rectified is explained,

and finally the method to transform the fisheye

images to rectified images is introduced.

3.1 Coordinate Frames

Figure 2 shows left and right fisheye camera

coordinates, 



and 



. Here, for simplicity, 



is a

base frame instead of the world frame. The origins

are the optical centers, the Z axes are the optical axes,

and the Y axes are the upper directions of the images

(for simplicity, we assume that the image plane is

perpendicular to the optical axis). The pose of 



represented by the translation 









and the rotation











, which are assumed to be calibrated in

advance. Figure 2 depicts the reference plane. It is a

rectangle of size 











. A coordinate frame 



defined as shown in Figure 2. The origin is the

center of the rectangle, the Y axis is the length

direction, the X axis is the width direction, and the Z

axis is the normal direction of the backside of the

reference plane. The pose of the reference plane is

set by the translation 









and the rotation 









Because the cameras must be aligned in parallel

to generate the rectified images (Figure 3), the

virtual camera coordinate frames, 



and 



, are

defined as follows (Figure 3).

VISAPP 2017 - International Conference on Computer Vision Theory and Applications

310

Figure 4: Projection planes.

Figure 5: Tilt and yaw angles including a reference plane.

Figure 6: Tilt and yaw angles including a reference plane

which crosses an extension of the stereo baseline.









coincides with 













is the same direction as 





to 













is the direction of the cross product of 





and















is the direction of the cross product of 





and 











is obtained by translating 



to 





Let us consider the light rays from a 3D point P to

the origins of the left and right virtual cameras, 





and 





. Their tilt angles 









and 









, which

are the angles of rotation around the X axes, are

equal. This is true for any 3D point. Then, any 3D

point is projected at the same vertical position on the

left and right projection planes, 





and 





, by

relating the vertical positions to the tilt angles. The

horizontal positions on the projection planes are

related to the yaw angles, 









and 









. Here

yaw angles are the angles of rotation around the

tilted Y axes. For convenience, the tilt angle is

defined in the counter clockwise direction from the

direction of the Z axes, while the yaw angle is

defined in the clockwise direction from the tilted Z

direction. The whole 3D space is projected into the

area for which the tilt angle ranges from π to –π

and the yaw angle ranges from  2

⁄

to  2

⁄

, as

shown in Figure 4.

3.2 Deciding the Rectifying Region

Only the rectangle-shaped portion of the projection

plane that contains a projection of the reference

plane is rectified rather than the whole projection

plane. The rectangle portion that has a tilt angle is

from 



to 



and a yaw angle is from 







is decided in the left projection plane. 



, for

  0,1,2,3 , indicate the four corners of the

reference plane, as shown in Figure 5. Their

coordinates in the 



frame are as follows.













⁄

,





⁄

,0













⁄

,





⁄

,0













⁄

,





⁄

,0













⁄

,





⁄

,0

(1)

The 





are converted to the 



frame through the





frame to yield 





. The tilt and yaw angles in the





frame can be calculated as follows:







tan























sin











(2)

where 





 represents a X coordinate of 





Finally, we find that







min









  0,1,2,3







max









0,1,2,3







min









  0,1,2,3







max









0,1,2,3

(3)

However, these values are updated when a reference

plane crosses an extension of the stereo baseline as

follows:







&







(4)

Reference Plane based Fisheye Stereo Epipolar Rectiﬁcation

311

Further, if the X coordinate of the cross point in the

frame 



is positive then 





 2

⁄

; otherwise,







 2

⁄

, as shown in Figure 6.

3.3 Deciding Pixel Coordinate Frame

of Rectified Images

The pixel coordinate frame of the left rectified

image, 



, is defined by equally quantizing the tilt

and yaw angles in the left projection plane. The

following two issues must be considered to choose

the quantizing resolution 



I1. 



affects the resolution of the 3D depth

measurement.

I2. 



affects 



, which is the Euclidean distance

on a reference plane corresponding to one pixel

of the rectified images.

Figure 7 shows how to choose 



based on the

desired depth resolution, 



. A 3D point Q is set on

the light ray from 





to 





so that the Euclidean

distance between Q and the reference plane becomes





. The yaw angle of the light ray from Q to 















, and one of the light rays from 





to 















. Then 



is chosen according to:



























(5)

The depth for one pixel disparity becomes about 



around the center of the reference plane. For issue I2,





is calculated as the distance between 





and S,

which is the intersection between the light ray from

Q to 





and the reference plane. A larger 



means

a small rectified image. If the calculated 



is larger

than the threshold 



, then point S is moved toward







on the reference plane so that the distance to







becomes 



. 



is recalculated by using S after

it has been moved.

By using the chosen value of 



, the rectangle

region from 





n to 





and from 





to 





on the left projection plane is quantized, as shown in

Figure 8. The rectangle region on the right

projection plane is also quantized vertically using 



The rectified image coordinates of a 3D point P,







,





 and 





,





, are obtained from the

following equations:











































⁄













































⁄













































⁄



(6)

Because 



















, 











and 































is true for any 3D point P on the

rectified image.

Figure 7: Decision of 



Figure 8: Pixel coordinate frames of rectified images.

Figure 9: Decision of a yaw angle for the pixel with the

coordinates 





,





.

The horizontal quantization of the right

projection plane remains to be decided. It is

quantized non-linearly so that the disparity becomes

zero on the reference plane. The yaw angle in the

frame 



for the pixel with the coordinates 











 is chosen as follows. First the tilt and yaw

angles of the light ray corresponding to the left pixel







,





, where 











and 











, are

obtained via the following equations:

















0.5

























0.5









(7)

Next, the point where the light ray and the reference

plane intersect, P’, is obtained as shown in Figure 9.

VISAPP 2017 - International Conference on Computer Vision Theory and Applications

312

Figure 10: Five poses of target planes.

Finally, the yaw angle in the frame 



for the pixel

with the coordinates 





,





 is selected to be







′



. Using this approach, any 3D point on the

reference plane has the same coordinates in the left

and right rectified images. This results in the targets

having the same apparent shapes in the left and right

images when the targets are close to the reference

plane.

3.4 Conversion from Fisheye to

Rectified Image

For each pixel in the left rectified image, the

intensity is obtained as follows. The tilt and yaw

angle of the light ray corresponding to the pixel, 



and 



, are derived using Equation set 7. The unit

direction vector for the light ray, 





, is obtained

according to:







sin



,cos



sin



,

,cos









cos









(8)







is then obtained by converting 





to the 



frame. The image coordinates in the left fisheye

image corresponding to the light ray are calculated

as real coordinates using the projection function











. Here, all intrinsic parameters of the left

fisheye camera were calibrated in advance. From the

intensities of the four neighboring pixels in the left

fisheye image, the intensity value of the pixel in the

left rectified image was calculated via linear

interpolation.

For each pixel in the right rectified image, the

intensity was obtained in the almost same way as for

one in the left rectified image. First, the 





for the

same coordinate in the left rectified image was

obtained. Then, 





was converted to 





and 





was obtained as a light ray from the intersection

between 





and the reference plane toward 





The image coordinates in the right fisheye image

corresponding to the light ray are calculated by the

function 









. From the intensities of the four

neighboring pixels in the right fisheye image, the

intensity value of the pixel in the right rectified

image was calculated via linear interpolation.

4 EXPERIMENTS

The motivation for proposing a new rectification

method is to enable 3D distance measurements for

targets in close proximity at wide angles, so as to

include the line extended from the stereo baseline.

Additionally, it should be possible to measure the

distances using a simple matching method even

when the texture of the targets is poor. Actual

fisheye stereo image pairs were rectified using the

proposed method, and 3D distance measurements

were performed using simple region-based matching

on the rectified images. For the comparison, another

rectification method was used on the same images

and the 3D distances were measured from the

rectified images using the same matching method.

4.1 Experimental Setup

The fisheye stereo that was used for the experiments

was mounted on the chest of a humanoid robot; the

baseline length was about 150 mm and the directions

of optical axes were set to point forward, as shown

in Figure 1. The field of view of the fisheye lenses

were each 214° with an almost spherical projection.

The cameras captured the whole viewing field with

1536  1536 pixel images.

The targets were two kinds of flat veneer

surfaces that were 300 mm  300 mm in size. One

surface was given a rich texture by placing a section

of newspaper on it, while the other was left blank

and was thus poorly textured. The targets were fixed

at five poses as shown in Figure 10. The poses were

chosen such that they were in poses that the

humanoid robot would be able to touch with its right

hand to stabilize itself. The positions and

orientations of the five poses were as follows.

Pose 0. (75, 0, 200), (0, 0, 0),

Pose 1. (−200, 0, 100), (0, −90, 0),

Pose 2. (−200, 0, 300), (0, −90, 0),

Pose 3. (−200, 200, 100), (0, −90, 0),

Pose 4. (75, 250, 100), (−90, 0, 0).

The positions and orientations were based on the

frame 



, which is the same as for the reference

plane. The position coordinates are given in

millimeters, while the orientations are given in

degrees. The orientation is represented by YXZ

Reference Plane based Fisheye Stereo Epipolar Rectiﬁcation

313

Euler angles. Because it was difficult to fix the

targets at the intended poses, the actual poses were

slightly different and the actual poses that were

measured using the method described in Section 4.4

are shown in Table 1. For convenience, a name is

given to each pair of stereo images, e.g., 0-rich-0,

where the first number represents a pose between 0

and 4, the second indicates the surface type (rich or

poor, i.e., richly or poorly textured), and the third

number represents the orientation (0–4). The

orientation numbers have the following significance:

0: base orientation

1: rotate 0 about 10° around Y axis

2: rotate 0 about −10° around Y axis

3: rotate 0 about 10° around X axis

4: rotate 0 about −10° around X axis

The rich and poor input image pairs at the five poses

with the base orientations are shown in Figure 11.

The images were captured for orientation 0 for every

position. Only for positons 0 and 1 were the images

captured for orientations 1 to 4.

4.2 Rectification of Fisheye Images

In the proposed rectification method a reference

plane is set to decide which portion of the fisheye

images should be rectified. The size of rectified

images is determined by setting a desired depth

resolution, 



, and a threshold for 



, 



. The

process introduced in Section 3.4 generates rectified

images based on the reference plane. Here three

typical examples are presented.

Example 1. Dimensions of a reference plane: 







250 , 





 250 . Pose: 













75,0,200

















0,0,0



. Further, 





is −51.2, 





49.6, 





= −17.6, and 





= 52.3. 



and 



are set to 10 and 2, respectively. Now 



is 2.11.

Because this is larger than 



, 



is updated to 1.3

so that 



is equal to 



. Then, 



is 0.684 and the

size of the rectified images is 124  172. Generated

rectified images from 0-rich-0 are shown in Figure

12(a).

Example 2. Dimensions of a reference plane:







 250 , 





 250 . Pose: 













200,0,100



, 













0,90,0



. Further, 





is , 





= , 





= 50.9, and 





= 73.1. 



and 



are set to 10 and 2, respectively. Now 



2.11. Because this is larger than 



, 



is updated to

9.49 so that 



is equal to 



. Then, 



is 0.297 and

the size of the rectified images is 152  1232.

Generated rectified images from 1-rich-0 are shown

in Figure 12(b).

Table 1: Target poses and reference plane poses.

Example 3. Dimensions of a reference plane:







 250 , 





 250 . Pose: 













200,300,100



, 













0,90,0



. Further,







is -95.7, 





= -20.7, 





= 38.9, and







= 75.4. 



and 



are set to 10 and 2,

respectively. Now 



is 4.83. Because this is larger

than 



, 



is updated to 4.21 so that 



is equal to





. Then, 



is 0.228 and the size of the rectified

images is 184  352. Generated rectified images

from 3-rich-0 are shown in Figure 12(c).

Because the reference planes are set close to the

actual targets in all three examples, the apparent

shape of the targets on the rectified images are quite

similar.

For comparison, another rectification method

was implemented by referring to Abraham. This

method rectifies fisheye images of the whole

projection plane with equal quantization amounts 



for both the vertical and horizontal directions. Thus,

the typical difference in the implementation of this

method and the proposed method is the relation

between the yaw angles and the horizontal pixel

coordinates on the right rectified image according to











































⁄



(9)

VISAPP 2017 - International Conference on Computer Vision Theory and Applications

314

0-rich-0 1-rich-0 2-rich-0

3-rich-0 4-rich-0

0-poor-0 1-poor-0 2-poor-0

3-poor-0 4-poor-0

Figure 11: Input fisheye images.

For the remainder of this paper, this rectification is

referred to as the Abraham method. As for the above

Examples 1, 2, and 3, the Abraham method was

applied to 0-rich-0, 1-rich-0, and 3-rich-0 and the

generated rectified images are shown in Figure 13.

The obtained quantization amount 



was the same

as with the proposed method with a reference plane

and the two parameters 



and 



. The sizes of the

rectified images are 396  768, 628  1232, and 812

 1604.

4.3 3D Distance Measurements

In the rectified images, corresponding pairs were

identified with a region-based method using NSSD

(Davison, 1998) as the measure of dissimilarity. The

necessary parameters are listed here.

patch_size: the size of the local region where the

NSSD is calculated. Here it was 21 pixel.

corr_th: threshold of the dissimilarity for

acceptance as a matching pair. Here it was 1.5.

corr_diff_th: threshold of the saliency for

acceptance as a matching pair. Here it was 0.5.

max_d: search was performed within this

distance from the reference plane. Here it was 200

mm.

For each pixel in the left rectified image, stereo

matching is carried out as follows if the

corresponding light ray crosses with the reference

plane. Between pixel



,



in the left rectified

image and the pixels in the right rectified images

that have the same  and lie between

 



_ 



⁄

and 



_ 



⁄

 , the

NSSD was calculated for the local region with a

size: patch_size  patch_size. If the minimum value

of the NSSDs is lower than corr_th and the

differences between the NSSDs of the horizontal

neighbors are larger than corr_diff_th, the pixel is a

matching candidate. Using the same process, the

matching candidate in the left rectified image is

determined via a reverse searched from the matching

candidates in the right rectified image. If the found

pixel is



,



, the pair is as a matching pair.

For the images rectified by the Abraham method,

the same procedure with the same parameters was

applied with only one exception: the search area was

defined as lying between    



_ 



⁄

 

Reference Plane based Fisheye Stereo Epipolar Rectiﬁcation

315

(a) (b) (c)

Figure 12: Rectified images by the proposed method.

(a) (b) (c)

Figure 13: Rectified images by Abraham method.

and   



_ 



⁄

by using d, which is

the disparity on the reference plane.

For the matching pairs, the 3D location was

calculated as the crossing point of the corresponding

left and right rays.

4.4 Evaluation Criteria

The total number, TN, is defined as the number of

the pixels for which the stereo matching was carried

out. The matching number, MN, is defined as the

number of matching pairs. The correct number, CN,

is defined as the number of matching pairs from

which a 3D location is calculated and for which the

distance between it and the actual target plane was

less than the threshold, 



. The following

evaluation criteria were used:

MN/TN,

CN/MN.

The actual pose of the target plane was obtained

from the left and right fisheye images by detecting

the cross marks, , that were placed at the four

corners of the target. The threshold 



was 10 mm.

4.5 Experimental Results

Figure 14 shows the results obtained from the

fisheye image pairs at the five poses with the base

orientation. Figure 15 shows the results obtained

from the fisheye image pairs at pose 0 with the five

orientations. Figure 16 shows the results obtained

from the fisheye image pairs at pose 1 with the five

orientations. Blue represents the results obtained by

using the proposed rectification method and orange

represents the results obtained using the Abraham

method. The + symbols indicate the values of

MN/TN  100, while the ■ symbols represent the

values of CN/MN  100. For each input image pair,

the 3D distances were measured nine times for

various reference plane positions. The numbers from

−4 to 4 on the horizontal axis, k, correspond to the

positions of the reference planes, where 0 is the base

position, which is the closest to the actual target

plane. –k means that the reference plane is moved

toward the cameras along the normal direction of the

reference plane at (k  25) mm from the base

position. +k means that the reference plane is moved

away from the cameras along the normal direction of

the reference plane at (k  25) mm from the base

VISAPP 2017 - International Conference on Computer Vision Theory and Applications

316

0-rich-0 1-rich-0 2-rich-0 3-rich-0 4-rich-0

0-poor-0 1-poor-0 2-poor-0 3-poor-0 4-poor-0

Figure 14: Results obtained from the fisheye image pairs at the five poses with the base orientation.

position. To obtain the results for Figures 15 and 16,

the same reference plane orientation was used

independent of the actual orientation of the target.

Table 1 shows the actual pose of targets and the pose

of reference planes at the base position.

Prior to conducting the experiments, the

following five phenomena were predicted to occur.

E1. With the proposed method, the results should

worsen as |k| increases.

E2. With the Abraham method, the results should

be almost constant independent of |k|.

E3. The results of the proposed method should be

better than ones of the Abraham method when

|k| is small.

E4. The results for poor textures should be worse

than for rich textures; however, for positions at

which |k| is small, reasonable results should be

obtained using the proposed method.

E5. The effect of the orientation difference between

the target and reference planes should be much

smaller than that caused by the positional

difference between the target and reference

planes.

Based on the results shown in Figure 14, the

expectations E1 to E4 are supported as follows. E1

was true for pose 0. For poses 0, 2, and 4 the results

for larger k did not worsen and for pose 3 the results

did not change significantly regardless of the values

of k. E2 was not true; the results monotonically

increased or decreased along with the positional

changes. This seems to be caused by the changes in

the amount of quantization. E3 held true except for

pose 2. Figure 17 depicts the changes in Euclidean

distance on the reference plane corresponding to one

pixel when the yaw angles are equally quantized on

the left and right rectified images, as in the Abraham

method. Figure 17(a) is for pose 0 and (b) for pose 2.

As seen in Figure 17(a), the Euclidean distances on

the reference plane corresponding to one pixel were

quite different between the left and right images; this

causes the difference in the apparent shape in the

rectified images even when the target resides on the

reference plane. Conversely, for pose 2, the

Euclidean distances on the reference plane

corresponding to one pixel were almost the same

between the left and right images, which causes

apparent shape in the left and right rectified images

to be similar even when the target is not on the

reference plane. This causes the results for pose 2 at

any position using the Abraham method to be almost

the same as those obtained at position 0 using the

proposed method. As expected according to E4, the

results for the poor texture were worse than those for

the rich texture for both the Abraham and the

proposed method. For pose 1, which is the most

challenging configuration for a stereo measurement,

the CN/MN was less than 25% for the Abraham

method. Conversely, using the proposed method, the

CN/MN was greater than 40% for |k| < 2. Figure 18

shows the results of the 3D distance measurements

for pose 1 of poor texture. The upper two rows are

for the proposed method and the lower two rows are

for the Abraham method. White solid rectangles

indicate the cameras. The white solid lines show the

viewing directions of the cameras. Further, white

dots represent the measured results. Additionally,

the red squares show the reference planes.

Estimating whether a real plane does or does not

exist in the measurement area and estimating the

pose of the plane if it does exist appears to be

difficult to achieve from the point clouds obtained

by the Abraham method. However, from the point

Reference Plane based Fisheye Stereo Epipolar Rectiﬁcation

317

0-rich-0 0-rich-1 0-rich-2 0-rich-3 0-rich-4

0-poor-0 0-poor-1 0-poor-2 0-poor-3 0-poor-4

Figure 15: Results obtained from the fisheye image pairs at pose 0 with the five orientations.

1-rich-0 1-rich-1 1-rich-2 1-rich-3 1-rich-4

1-poor-0 1-poor-1 1-poor-2 1-poor-3 1-poor-4

Figure 16: Results obtained from the fisheye image pairs at pose 1 with the five orientations.

clouds obtained by the proposed method for |k| < 2,

making such an estimation appears to be possible

with only simple post processing.

Based on the results shown in Figures 15 and 16,

E5 is true.

(a) (b)

Figure 17: Changes in Euclidean distance on the reference

plane corresponding to one pixel when the yaw angles are

equally quantized on the left and right rectified images.

VISAPP 2017 - International Conference on Computer Vision Theory and Applications

318

k=-4 k=-3 k=-2 k=-1 k=0

k=1 k=2 k=3 k=4

Figure 18: Results of the 3D distance measurements for pose 1 of poor texture.

Reference Plane based Fisheye Stereo Epipolar Rectiﬁcation

319

5 CONCLUSIONS

We proposed a new epipolar rectification method for

fisheye images. It rectifies a portion of the fisheye

images so that the apparent shape on the left and

right rectified images becomes similar if a target is

close to a reference plane. By using the proposed

method and setting the reference plane appropriately,

the 3D distances of a target plane can be measured

using a simple region-based matching method, even

if the target plane lies within reach of the robot to

which the cameras are mounted and even if it lies in

the direction of the extension of the stereo baseline.

The superiority of the proposed method was

experimentally compared with another method to

validate it.

We are now developing a method to judge the

existence of a plane and estimate the pose of the

plane should it exist based on the proposed method;

this method will be applied for the motion planning

of a humanoid robot when it needs to contact any

part of its upper body with the environment to retain

its balance.

REFERENCES

Sentis, L., 2010. Compliant control of whole-body multi-

contact behaviors in humanoid robots. Motion

Planning for Humanoid Robots, Springer: 29-66.

Escande, A., A. Kheddar, et al., 2013. Planning contact

points for humanoid robots. Robotics and Autonomous

Systems 61(5): 428-442.

Henze, B., M. A. Roa, et al., 2016. Passivity-based whole-

body balancing for torque-controlled humanoid robots

in multi-contact scenarios. The International Journal of

Robotics Research: 0278364916653815.

Brossette, S., J. Vaillant, et al., 2013. Point-cloud multi-

contact planning for humanoids: Preliminary results.

6th IEEE Conference on Robotics, Automation and

Mechatronics (RAM), IEEE.

Khatib, O. and S.-Y. Chung, 2014. SupraPeds: Humanoid

contact-supported locomotion for 3D unstructured

environments. IEEE International Conference on

Robotics and Automation (ICRA), IEEE.

Schmid, C. and A. Zisserman, 1997. Automatic line

matching across views. Computer Vision and Pattern

Recognition..

Baumberg, A., 2000. Reliable feature matching across

widely separated views. Computer Vision and Pattern

Recognition.

Matas, J., O. Chum, et al., 2004. Robust wide-baseline

stereo from maximally stable extremal regions. Image

and vision computing 22(10): 761-767.

Bay, H., V. Ferrari, et al., 2005. Wide-baseline stereo

matching with line segments. Computer Vision and

Pattern Recognition.

Scharstein, D. and R. Szeliski, 2002. A taxonomy and

evaluation of dense two-frame stereo correspondence

algorithms. International Journal of Computer Vision

47(1-3): 7-42.

Hartley, R. and A. Zisserman, 2003. Multiple view

geometry in computer vision, Cambridge university

press.

Ayache, N. and C. Hansen, 1988. Rectification of images

for binocular and trinocular stereovision. 9th

International Conference on Pattern Recognition.

Courtney, P., N. A. Thacker, et al., 1992. A Hardware

Architecture for Image Rectification and Ground Plane

Obstacle Avoidance. Proc. 11th ICPR 1992.

Loop, C. and Z. Zhang, 1999. Computing rectifying

homographies for stereo vision. Computer Vision and

Pattern Recognition.

Hartley, R. I., 1999. Theory and Practice of Projective

Rectification. Int. J. Comput. Vision 35(2): 115-127.

Hirschmuller, H. and D. Scharstein, 2009. Evaluation of

stereo matching costs on images with radiometric

differences. IEEE Transactions on Pattern Analysis

and Machine Intelligence 31(9): 1582-1599.

Zabih, R. and J. Woodfill, 1994. Non-parametric local

transforms for computing visual correspondence.

European conference on computer vision, Springer.

Geiger, A., M. Roser, et al., 2010. Efficient large-scale

stereo matching. Asian conference on computer vision,

Springer.

Hirschmuller, H., 2008. Stereo processing by semiglobal

matching and mutual information. IEEE Transactions

on Pattern Analysis and Machine Intelligence 30(2):

328-341.

Devernay, F. and O. D. Faugeras, 1994. Computing

differential properties of 3-D shapes from stereoscopic

images without 3-D models. Computer Vision and

Pattern Recognition.

Tola, E., V. Lepetit, et al., 2010. Daisy: An efficient dense

descriptor applied to wide-baseline stereo. Pattern

Analysis and Machine Intelligence, IEEE Transactions

on 32(5): 815-830.

Kita, N., 2011. Direct floor height measurement for biped

walking robot by fisheye stereo. 11th IEEE-RAS

International Conference on Humanoid Robots.

Pollefeys, M., R. Koch, et al., 1999. A simple and efficient

rectification method for general motion. The

Proceedings of the Seventh IEEE International

Conference on Computer Vision.

Abraham, S. and W. Förstner, 2005. Fish-eye-stereo

calibration and epipolar rectification. ISPRS Journal of

Photogrammetry and Remote Sensing 59(5): 278-288.

Davison, A., 1998. Mobile Robot Navigation Using

Active Vision. D. Phil Thesis, University of Oxford.

VISAPP 2017 - International Conference on Computer Vision Theory and Applications

320