A Comprehensive Approach for Evaluation of Stereo Correspondence

Solutions in Augmented Reality

Bahar Pourazar and Oscar Meruvia-Pastor

Computer Science Department, Memorial University of Newfoundland, St. John’s, Canada

Keywords:

Augmented Reality, Human Visual System, Binocular Stereo, Stereoacuity, Disparity, Stereo Correspondence.

Abstract:

This paper suggests a comprehensive approach for the evaluation of stereo correspondence techniques based

on the speciﬁc requirements of outdoor augmented reality systems. To this end, we present an evaluation

model that integrates existing metrics of stereo correspondence algorithms with additional metrics that con-

sider human factors that are relevant in the context of outdoor augmented reality systems. Our model pro-

vides modiﬁed metrics of stereoacuity, average outliers, disparity error, and processing time. These metrics

have been modiﬁed to provide more relevant information with respect to the target application. We evaluate

our model using two stereo correspondence methods: the OpenCV implementation of the semi-global block

matching, also known as SGBM, which is a modiﬁed version of the semi-global matching by Hirschmuller;

and our implementation of the solution by Mei et al., known as ADCensus. To test these methods, we use a

sample of ﬁfty-two image pairs selected from the Kitti stereo dataset, which depicts many situations typical of

outdoor scenery. Experimental results show that our proposed model can provide a more detailed evaluation

of both algorithms. Further, we discuss areas of improvement and suggest directions for future research.

1 INTRODUCTION

Many Augmented Reality (AR) systems require some

form of optical markers placed within a scene in order

to integrate computer-generated objects with scenery

directly generated from the real world; these markers

help the system identify the location of an item within

the scene to be used as a place-holder for the synthetic

objects. Placing such markers in objects that are part

of a scene may work for many indoor environments,

but is a less practical option in outdoor AR settings

where users can move freely in their surroundings. In

the absence of such markers, an AR system requires a

depth map of the surrounding environment. In order

to obtain the 3D location of different objects in the

scene, several technologies can be used. Among these

technologies, one of the most practical techniques is

the use of stereo cameras to take images of the scene

from slightly different viewpoints. These images can

then be processed by the stereo correspondence algo-

rithms, which attempt to ﬁnd the corresponding pixels

in the stereo images, to generate the depth map of the

surrounding environment. This map is then used to in-

tegrate virtual objects in the scene such that synthetic

objects are rendered in a way that considers the oc-

clusion properties and the depth of the real objects in

the scene. Due to the potential applications of stereo

correspondence, which is one of the most extensively

studied subjects in computer vision (Scharstein and

Szeliski, 2002), using an evaluation scheme that is de-

signed according to the speciﬁc requirements of the

target application is essential. The evaluation scheme

proposed in this paper is designed for outdoor AR ap-

plications which make use of stereo vision techniques

to obtain a depth map of the surrounding environment.

Over the past few years, a few evaluation schemes

have been proposed by researchers in the ﬁeld

to provide a testbed for assessment of the solu-

tions based on speciﬁc criteria. The Middlebury

Stereo (Daniel Scharstein, 2012) and the Kitti Stereo

benchmarks (Andreas Geiger, 2012) are two of the

most popular and widely used evaluation systems

through which a stereo correspondence algorithm can

be evaluated and compared to others. However, both

of these models take a general approach towards eval-

uating the methods; that is, they have not been de-

signed with an eye to any particular target applica-

tion. In fact, these models focus on the particular ap-

plication of a stereo correspondence algorithm as a

solution per se to ﬁnd the best matches of the corre-

sponding pixels in stereo pairs, regardless of the target

application. However, the information provided by

these evaluation benchmarks is not sufﬁcient to select

Pourazar B. and Meruvia-Pastor O..

A Comprehensive Approach for Evaluation of Stereo Correspondence Solutions in Augmented Reality.

DOI: 10.5220/0005272500050013

In Proceedings of the 10th International Conference on Computer Vision Theory and Applications (VISAPP-2015), pages 5-13

ISBN: 978-989-758-091-8

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

a given algorithm suitable for AR, because we need

information on the speciﬁc accuracy and efﬁciency

of these algorithms, for example to assess their suit-

ability regarding their processing time. The fact that

some of this information is missing from such stan-

dard evaluations of stereo correspondence methods

has compelled us to take steps towards a comprehen-

sive evaluation design based on speciﬁc requirements

of outdoor stereo AR applications, which results in

better deﬁnition and adjustment of the criteria for ef-

ﬁciency and accuracy metrics used for the evaluation.

2 BACKGROUND AND

TERMINOLOGY

Over the past decades, many mobile AR systems have

been built, from the Touring Machine in 1997 (Feiner

et al., 1997) to Google Glass which was announced

in 2013 (Google Inc., 2013); however, most of these

prototypes have remained experimental due to certain

difﬁculties and constraints of using them in practical

applications (Drascic and Milgram, 1996; Livingston,

2005). Two of the most important constraints are the

human factors in AR and the high demand of com-

putational resources needed to provide a real-time in-

teraction between the user and the system; therefore,

in order to build a practical AR system, these factors

need to be carefully considered while designing dif-

ferent components of the system.

2.1 Binocular Vision and the Human

Visual System

Studies in binocular vision show that human percep-

tion of depth can vary depending on the environment

and under different circumstances. Many studies have

focused on the evaluation of human perception of

depth within different frameworks and in different

applications, such as virtual reality and AR, which

have recently attracted more attention (Wann et al.,

1995; Drascic and Milgram, 1996; Livingston, 2005;

Jerome and Witmer, 2005; Swan et al., 2007; Kruijff

et al., 2010). These studies show that the viewer per-

ception of depth is inversely proportional to his/her

distance from the object (Kruijff et al., 2010; Swan

et al., 2007; Jerome and Witmer, 2005; Livingston,

2005); for instance, in (Swan et al., 2007), some ex-

periments are designed to study and evaluate the hu-

man perception of distance, in terms of the absolute

depth of the objects from the observer, for an outdoor

AR application in urban settings. However, in this

research we are more interested in the human percep-

tion of relative depth in stereo vision: the ability to

perceive and distinguish the depth of different objects

relative to each other. In binocular vision, the mini-

mum depth difference between two points that can be

detected in the visual system is known as Stereoscopic

Acuity or Stereoacuity (Pfautz, 2000). This metric is

normally presented in angular units, commonly arc-

seconds. According to the geometry of binocular vi-

sion illustrated in Figure 1, stereoacuity can be ob-

tained from the following equation:

θ =

a∆Z

(1)

This equation estimates the angular disparity in ra-

dians, where a is the distance between the center of

the pupils of the two eyes, which is known as inter-

pupillary distance.

ΔZ

F′

Figure 1: Binocular disparity.

According to standard stereo tests (Reading,

1983), the ﬁnest detectable disparity in the hu-

man visual system (HVS) is approximately 10-15

arcseconds. However, a more recent study on 60

subjects (Garnham and Sloper, 2006) at different age

groups, from 17 to 83 using standard stereotests,

shows that the average stereoacuity for different age

groups is as follows:

Table 1: Average stereoacuity for subjects ages 17 to 83.

Age Range Avg Stereoacuity(arcsecs)

17-29 32

30-49 33.75

50-69 38.75

70-83 112.5

As can be seen, the stereoacuity for the HVS in-

creases with age, that is the amount of error in the

depth results is less perceptible in the visual system of

the elders than the younger people. Using these values

in Equation 1 along with the average interpupillary

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

distance in the human visual system, that is approx-

imately 64mm (Howard and Rogers, 1995), we can

estimate the threshold for minimum detectable depth

between two objects based on their distance from the

observer.

2.2 Real-time Interaction

Providing real-time interaction in an AR system for

the user requires the processing time and update rate

of the whole system to keep up ideally with the stan-

dard video frame rate, between 24fps and 30fps, or

higher. However, studies show that in practice to build

a reasonable interactive augmented world the process-

ing rate should not be less than half of the video

frame rate (Hertzmann and Perlin, 2000). Two ways

to speed up a system are using a more advanced tech-

nology and hardware, and implementing more sophis-

ticated and efﬁcient software design. However, hav-

ing access to advanced technology and hardware is

not always feasible and even the most advanced tech-

nologies have some limitation in their memory space

and computational capability which may not meet the

requirement for some real-time applications. There-

fore, in many cases, employing the second approach

is more practical.

3 COMPREHENSIVE

EVALUATION SCHEME

In our design, unlike the Middlebury or Kitti bench-

marks, we label a pixel in the disparity results as

an outlier if the angular measurement, that is the

stereoacuity, corresponding to the depth error be-

tween the ground truth and the estimated depth value

by the algorithm is more than the average perceptible

stereoacuity of the HVS as determined by standard

stereo tests (Reading, 1983; Garnham and Sloper,

2006). Moreover, we use the average stereoacuity for

different age groups (Garnham and Sloper, 2006) in

our design to evaluate the performance of the algo-

rithm for users at different ages; this makes the eval-

uation results more reliable and applicable to appli-

cations of AR. In order to evaluate the efﬁciency of

an algorithm and investigate whether it meets the re-

quirements for being part of a real-time AR applica-

tion, we integrate a module in the evaluation process

that reports on the average execution time of the algo-

rithm for the input data. The average outliers based on

the speciﬁed stereoacuity thresholds and the average

disparity error are also estimated during the evalua-

tion process.

In addition, our model employs a particular ap-

proach which can be of speciﬁc value to AR appli-

cations. In this approach, we suggest that it is im-

portant to focus the evaluation process on particular

regions of the disparity map rather than the whole

image. The main reason is that salient edges caused

by depth discontinuities, which also represent object

boundaries and occlusion, are important depth cues

for the human visual system to better perceive the

location of different objects in the 3D environment

(Szeliski, 2011). Therefore, more accurate depth re-

sults in these regions permits a higher quality com-

bination of the depth map of the real world with the

virtual depth of the synthetic objects that are part of

the AR scene. To this end, we build a mask using the

ground truth disparity map which is, in fact, a mask of

the edges in the image caused by depth discontinuities

and their surrounding area.

3.1 Architecture

The block diagram of our evaluation system can be

seen in Figure 2, which illustrates the sequence of

the operations during the whole process. As can be

seen in this ﬁgure, ﬁrst the input data consisting of

the stereo images, the ground truth disparity, and the

calibration data are passed to the system. Afterwards,

the speciﬁed masks are created using a Canny edge

detector and a Dilation operation with the appropriate

parameters selected separately for each image. After

the corresponding disparity maps have been generated

by the stereo algorithm and stored on the disk, they

are passed to the evaluation module with the speciﬁed

arguments. Finally, the evaluation metrics are esti-

mated and output as data ﬁles and plots to facilitate

the evaluation of the stereo algorithm in the applica-

tion of interest.

3.2 Evaluation Metrics

The main evaluation component consists of different

modules which estimate speciﬁc evaluation metrics.

These metrics are: 1) the average stereoacuity, 2) the

average outliers, 3) the average disparity error, and 4)

the average execution time. Analysis of these metrics

in the framework of an outdoor AR application will

then allow for a practical evaluation of the stereo al-

gorithm performance.

3.2.1 Average Stereoacuity

We can break the estimation of the average stereoacu-

ity down to 3 steps: 1) estimate the stereoacuity based

on the generated disparity for each image pair and

AComprehensiveApproachforEvaluationofStereoCorrespondenceSolutionsinAugmentedReality

Input

Stereo

images

Ground truth

disparity

Calibration

data

Evaluation

Masks

Masking

(Dilation, Canny)

Stereo Correspondence

(stereo algorithm)

Disparity

maps

Avg_Outliers

Avg_StAc

Avg_DispErr

Avg_ExecTime

Comprehensive Evaluation System

Output

(evaluation metrics)

Figure 2: Architecture of the evaluation system.

the ground truth; 2) average the stereoacuity results

over certain depth ranges in each image; 3) average

the results from the previous step over all the images.

Corresponding plots are generated after the third step

based on the ﬁnal results.

According to the speciﬁc age ranges, different val-

ues are reported for the average stereoacuity at the end

of the evaluation. In order to estimate this metric, the

depth values corresponding to both ground truth and

the generated disparity by the algorithm are ﬁrst cal-

culated. Subsequently, the difference between these

values is used in Equation 1 to calculate the corre-

sponding stereoacuity, Equations 2 and 3. This pro-

cess is done for all the pixels in the image; or if a

mask has been provided, it will be applied only to the

pixels in the masked areas. Finally the results are out-

put and stored in a separate data ﬁle for each image.

Depth

err



depth

− depth

gen



(2)

StAc =

a ∗ Depth

err

depth

(3)

Here, depth

, depth

gen

, StAc and a denote the

ground truth depth, the generated depth by the algo-

rithm, the corresponding stereoacuity, and the average

interpupillary distance, respectively.

After conducting the ﬁrst step on all the disparity

maps corresponding to input image pairs, the second

step starts by building a histogram of the stereoacuity

values over speciﬁc depth ranges, Equations 5 and 6.

In our design, the width of each bin determines the

aforementioned depth range and is kept constant for

all the bins. Moreover, the number of bins along with

their corresponding width determine the total distance

over which the results are estimated and subsequently

examined.

Total distance = Number o f bins ∗Width (4)

For outdoor applications of AR, these parameters are

normally set to certain values so that the total distance

covers the medium to far depth ﬁelds; extending from

1.5 meters to more than 30 meters (Swan et al., 2007).

Sum

dRange

∑

dRange

StAc (5)

Avg StAc

dRange

Sum

dRange

(NumO f Pixs

dRange

)

(6)

Here, Avg StAc

dRange

and Sum

dRange

denote the

average and total stereoacuity over speciﬁed depth

ranges in each image, and NumO f Pixs

dRange

denote

the number of pixels within each depth range.

The results of the previous step, all stored in a data

ﬁle, are then passed to the last step. At this point, a

histogram is built over the data from all the dispar-

ity images, which results in the average stereoacuity

values within each speciﬁed depth range over all the

images, Equation 7. It should be noted that the num-

ber of bins and their corresponding width in this step

are similar to the histogram constructed in the the pre-

vious step.

Avg StAc =

∑

imgs

(Sum

dRange

)

∑

imgs

(NumO f Pixs

dRange

)

(7)

3.2.2 Average Outliers

For this measurement, the relative depth error is ﬁrst

calculated by ﬁnding the corresponding depth values

for the ground truth disparity and the disparity gen-

erated by the algorithm and then converted to effec-

tive stereoacuity, as shown in Equations 2 and 3. This

value is then compared to the relative detectable depth

threshold for the HVS that is estimated using Equa-

tion 1. If the relative depth error is equal to or more

than the detectable threshold in the HVS, Equation 8,

then the corresponding pixel is labelled as an outlier.

StAc ≥ StAc

threshold

(8)

Since we are using four different thresholds of

stereoacuity corresponding to different age groups

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

Figure 3: Sample outdoor scene from the Kitti stereo dataset (Andreas Geiger, 2012). Top: left image. Bottom: right image.

in our evaluation, the estimated error is compared

against each of these thresholds, and therefore, four

different values are eventually calculated. The aver-

age outliers is then computed as a fraction of the total

number of pixels in the inspected regions, Equation 9.

Avg Outliers =

Outliers

total

NumbO f Pixs

(9)

This process is repeated for all the pixels in the

image or merely the pixels in the masked regions de-

pending on the availability of a mask.

3.2.3 Average Disparity Error

This metric is the mean error between the ground truth

disparity and the one found by the algorithm, which

is estimated for all the pixels in the image or merely

the masked pixels depending on the availability of a

mask. It can be presented with the following estima-

tions:

Disp

err



disp

− disp

gen



(10)

DispErr

total

∑

pixs

Disp

err

(11)

After the computation of the total disparity error

for the pixels, the average disparity error is estimated

as follows:

Avg DispErr =

DispErr

total

NumO f Pixs

(12)

The NumOfPixs is, in fact, the total number of pixels

in the whole image or the masked regions, depending

on the case for which the error is being estimated.

3.2.4 Average Execution Time

We use the C++ function clock() to estimate the av-

erage execution time of the algorithms for generating

disparity results corresponding to the input stereo im-

ages, ﬁfty-two image pairs in our evaluation. We then

compare this value to the acceptable criteria for hav-

ing a real-time interactive AR system from the user’s

perspective, that is, a processing time less than 0.06-

0.08 seconds per frame corresponding to a frame rate

of 12.5 to 16.5 fps, as proposed by (Hertzmann and

Perlin, 2000).

Analyzing each of these metrics in the light of the

relevant factors in an outdoor AR application results

in a practical evaluation of the stereo correspondence

methods.

4 VALIDATION

In order to verify the effectiveness of our proposed

model for the evaluation of stereo correspondence

methods in outdoor AR applications, we have eval-

uated two sample stereo algorithms: the OpenCV im-

plementation of the semi-global block matching, also

AComprehensiveApproachforEvaluationofStereoCorrespondenceSolutionsinAugmentedReality

known as SGBM, which is a modiﬁed version of the

semi-global matching by Hirschmuller (Hirschmuller,

2008); and our implementation of “on building an ac-

curate stereo matching system on graphics hardware”

(Mei et al., 2011), also known as ADCensus. It should

be noted that the CPU implementation of both meth-

ods have been used.

Experiments were carried out on a Linux plat-

form with Intel Core(TM) i7 3.20GHz CPU. Fifty-

two image pairs were chosen from the Kitti Stereo

Dataset corresponding to real outdoor scenes. Figure

3 shows a sample stereo pair from the Kitti dataset.

The OpenCV Canny edge detector and Dilation oper-

ation were used for building the speciﬁed masks and

the expansion of the masked areas, respectively. Pa-

rameters corresponding to stereo algorithms, the aper-

ture size in Canny, and the degree of Dilation were

kept constant over all the images and experiments.

These values are presented in Tables 2, 3, and 4. The

parameters for Dilation and Canny were chosen em-

pirically by running the algorithms over our image set

with the intention of selecting the values which best

deﬁne the depth edges and expand them enough to

include regions with different depths surrounding the

edges.

Table 2: SGBM Parameters.

SADWindowSize 9

disp12MaxDiff 2

uniquenessRatio 10

P2 3*9

speckleWindowSize 100

speckleRange 2

Table 3: ADCensus Parameters.

10 λ

Census

30 L

34 L

20 τ

6 π

1.0 π

3.0

15 τ

20 τ

0.4

Table 4: Masking Parameters.

Dilation iterations 10

Canny apertureSize 3

The minimum and maximum disparity values are

also kept constant for each image pair in both algo-

rithms; however, the maximum disparity differ for

each image pair as the scenes are different and objects

are located at different depth ﬁelds. The minimum

disparity is set to 0 for both algorithms. The maxi-

mum disparity for each image pair is selected based

on the maximum value in their corresponding ground

truth disparity. The standard stereoacuities used for

the evaluation are based on the results mentioned in

Table 1.

4.1 Experimental Results

The evaluation metrics, mentioned in Section 3.2,

were estimated for SGBM and ADCensus in our eval-

uation system. The main results are described below.

4.1.1 Average Stereoacuity

Figures 4 and 5 show the average relative depth error

converted to effective stereoacuity over distance for

the masked and the whole images with both SGBM

and ADCensus.

In these plots, a cross point below a stereoacu-

ity threshold (straight lines) implies that the average

error in the disparity values estimated by the stereo

algorithm is imperceptible to the human visual sys-

tem. However, a value higher than the threshold in-

dicates that the error cannot be ignored and should

be resolved to achieve a better alignment between the

virtual and the real world in the AR application of in-

terest. Moreover, as can be seen most of the errors fall

below the standard stereoacuity value corresponding

to older ages; indicating that these are not perceptible

to the visual system of people at these particular ages.

The zero values in the plots imply that either there is

no object within the corresponding range or the dis-

parity value estimated by the algorithm is equal to

the ground truth disparity; however, since the aver-

age of the results has been taken over all the images,

it is more likely that the zero values indicate no object

within the particular range.

As can be seen in the results, SGBM performs bet-

ter in ﬁnding more accurate corresponding matches

compared to ADCensus, as most of the error points

fall below the standard stereoacuity lines. Moreover,

the plots show that in both methods the signiﬁcant

amount of error corresponds to the near ﬁeld objects,

within the ﬁrst 5 meters. This range of the depth

ﬁeld can be considerably important in some applica-

tions, such as the ones involving certain manipulative

tasks; for these types of applications, other technolo-

gies, such as depth sensing cameras, are better choice.

Comparing the results between the masked and

the whole image show that the average error over the

masked regions; that is, near the depth edges, is very

similar to the results over the whole image. This may

imply that there is no additional beneﬁt in the inspec-

tion of these regions. However, this might be merely

an indication of the performance of the selected algo-

rithms and can be better analyzed by evaluating more

algorithms within our model.

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

0.2

0.4

0.6

0.8

0 5 10 15 20 25 30 35 40 45 50

Stereoacuity(arcmin)

Distance(m)

Age: 30-49

Age: 50-69

Avg err

Age: 17-29

(a) Average relative depth error by SGBM

0.2

0.4

0.6

0.8

0 5 10 15 20 25 30 35 40 45 50

Stereoacuity(arcmin)

Distance(m)

Age: 30-49

Age: 50-69

Avg err

Age: 17-29

(b) Average relative depth error by ADCensus

Figure 4: Average relative depth error over distance for the masked image.

0.2

0.4

0.6

0.8

0 5 10 15 20 25 30 35 40 45 50

Stereoacuity(arcmin)

Distance(m)

Age: 30-49

Age: 50-69

Avg err

Age: 17-29

(a) Average relative depth error by SGBM

0.2

0.4

0.6

0.8

0 5 10 15 20 25 30 35 40 45 50

Stereo acuity(Arcmin)

Distance(m)

Age: 30-49

Age: 50-69

Avg err

Age: 17-29

(b) Average relative depth error by ADCensus

Figure 5: Average relative depth error over distance for the whole image.

4.1.2 Average Outliers

The average outliers for the masked and the whole

image are presented in Table 5. Results show that in

both cases, the masked regions and the whole image,

SGBM has less average outliers than ADCensus, indi-

cating that SGBM generates a more accurate disparity

map as perceived by the human visual system.

Another observation is that in SGBM, the average

outliers over the masked regions is larger than the

average outliers over the whole image, whereas in

ADCensus the opposite behavior is observed. This

implies that SGBM generates less accurate results

near the depth discontinuities and occluded regions

compared to the other areas in the image. On the

other hand, ADCensus generates more accurate

disparity values near the depth edges compared to the

other regions in the image and tends to preserve the

occluded regions. This only indicates that, despite

the better performance of SGBM over ADCensus

according to the experimental results, in cases

where only one of these solutions is available, it is

reasonable to consider this behavior to employ the

method in the right application based on the accuracy

requirement of the target system in different regions.

In other words, it is important to ﬁrst investigate

which regions of the image are more important in

the context of the target application. For instance,

ADCensus performs better in an application where

the areas near depth discontinuities and occlusion are

more important than the rest of the image, such as

image compositing for layering visual elements on

the scene, compared to application scenarios where

obtaining an accurate, dense disparity map for all the

regions in an image is essential, such as constructing

a 3D model of the scene or preparing a model for

3D printing. Figure 6 shows a comparison of all the

results.

4.1.3 Average Disparity Error

The average disparity error for both the whole and

the masked image are presented in Table 6. As can

be seen, ADCensus results in less average disparity

error than SGBM. This difference is likely caused

by the various reﬁnement steps implemented in the

ADCensus algorithm which do not exist in SGBM.

As a result, despite the larger outliers in ADCensus

than SGBM as presented in Section 4.1.2, ADCen-

sus attempts to decrease the difference between the

resulting disparity value and the ground truth dispar-

AComprehensiveApproachforEvaluationofStereoCorrespondenceSolutionsinAugmentedReality

Table 5: Average outliers.

Algorithm Age

Average

Outliers(Masked)

Average

Outliers(Full)

SGBM

17-29 0.12 0.11

30-49 0.11 0.10

50-69 0.09 0.08

70-83 0.0012 0.005

ADCensus

17-29 0.23 0.27

30-49 0.22 0.26

50-69 0.18 0.22

70-83 0.002 0.002

Table 6: Average disparity error.

Algorithm Region Avg DispErr

SGBM

Full 6.58

Masked 7.81

ADCensus

Full 4.49

Masked 4.74

ity through multiple reﬁnement steps, thus generating

smoother disparity patches within different regions of

the image. Figure 7 presents a comparison between

all the results.

4.1.4 Average Execution Time

In another experiment, we estimated the average exe-

cution time for both algorithms using a set of ﬁfty-two

stereo image pairs from the Kitti data set (Andreas

Geiger, 2012). Results of the average execution time

over all the images are shown in Table 7. Consider-

ing the requirements of a real-time AR system (Hertz-

mann and Perlin, 2000), the processing time of each

frame should not be more than 0.06-0.08 seconds. Al-

though the current implementation of SGBM could be

used when the real world scene remains stable for ap-

proximately one second, it can be safely concluded

that none of these implementation meets the require-

ments of a real-time interactive AR system.

Table 7: Average execution time.

Algorithm Avg ExecTime(secs)

SGBM 0.54

ADCensus 272.82

4.2 Overview

Table 8 shows an overview of the difference between

our proposed evaluation approach and the other

evaluation models, Middlebury and Kitti, in terms of

the estimated evaluation metrics.

It should be noted that although the average error

and the average outliers exist in the other evaluation

schemes as well, the major difference which makes

our evaluation more appropriate than the other

schemes for practical applications of AR, is the ap-

proach employed during the design of the metrics and

the analysis of the results in the evaluation process.

In fact, integrating the important factors related to

the human visual system and its perception of depth

in the design of the metrics and the insights they

provide make the evaluation model more relevant and

applicable to outdoor AR systems.

Table 8: Comparison of different evaluation schemes.

Metrics Evaluation Models

Middle-

bury

Kitti

Comprehensive

Evaluation

Avg StAc 7 7 3

Avg Outliers 3 3 3

Avg DispErr 3 3 3

Avg ExecTime 7 7 3

0.05

0.1

0.15

0.2

0.25

SGBM-Masked

SGBM-Full

ADCensus-Masked

ADCensus-Full

Average Outliers

17-29

30-49

50-69

70-83

Figure 6: Average outliers for both SGBM and ADCensus.

SGBM-Masked

SGBM-Full

ADCensus-Masked

ADCensus-Full

Average Disparity Error

Figure 7: Average disparity error for both SGBM and AD-

Census.

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

5 CONCLUSION

In this paper, we present a new approach for evaluat-

ing stereo algorithms in which we suggest that eval-

uation metrics should be designed or chosen based

on the speciﬁc requirements of the target applica-

tion. We then applied this concept to the particular

application of augmented reality systems in outdoor

environments. We chose outdoor environments be-

cause these environments pose additional challenges

to stereo vision algorithms and AR systems since they

must cope with external factors that cannot be eas-

ily controlled, such as the effects of shadows, illumi-

nation and weather. As a result, a practical analysis

on the performance of the stereo algorithms, in terms

of accuracy and processing time as perceived by the

HVS, was presented. The results over the masked re-

gions did not show any signiﬁcant beneﬁt to the eval-

uation of the areas near the depth discontinuities and

occluded regions; however, as mentioned previously,

this might be merely an indication of the performance

of the algorithms we selected for evaluation and can

only be better analyzed by evaluating more algorithms

within our model. In either case, we hypothesize that,

due to the importance of occlusion and areas near

depth discontinuities to the HVS for the perception

of depth in AR applications, it might be reasonable

to focus more on the regions that contain depth edges

and their surroundings when designing or employing

a stereo matching technique for an AR application.

Validation of this hypothesis is a topic we would like

to further investigate in the future research. More-

over, we would like to assess the beneﬁts of our model

for other AR applications, such as underwater envi-

ronments, and explore other factors which may also

affect the evaluation process, such as the resolution

of the display device, and the effect of contrast and

brightness.

ACKNOWLEDGEMENTS

The authors would like to thank the Research and De-

velopment Corporation (RDC) of Newfoundland and

Labrador, which provided the funding for the project,

and also Dr. Paul Gilliard, for his advice and support

for this project.

REFERENCES

Andreas Geiger (2012). KITTI Vision.http://www.cv

libs.net/datasets/kitti/.

Daniel Scharstein (2012). MiddleBury Evaluation.http:// vi-

sion.middlebury.edu/stereo/eval/.

Drascic, D. and Milgram, P. (1996). Perceptual issues in

augmented reality. In Electronic Imaging: Science &

Technology, pages 123–134. International Society for

Optics and Photonics.

Feiner, S., MacIntyre, B., H

ollerer, T., and Webster, A.

(1997). A touring machine: Prototyping 3d mobile

augmented reality systems for exploring the urban en-

vironment. Personal Technologies, 1(4):208–217.

Garnham, L. and Sloper, J. (2006). Effect of age on

adult stereoacuity as measured by different types

of stereotest. British journal of ophthalmology,

90(1):91–95.

Google Inc. (2013). Google AR Glasses. http://www.

google.com/glass/start/.

Hertzmann, A. and Perlin, K. (2000). Painterly rendering

for video and interaction. In Proceedings of the 1st

International Symposium on Non-photorealistic Ani-

mation and Rendering, NPAR ’00, pages 7–12, New

York, NY, USA. ACM.

Hirschmuller, H. (2008). Stereo processing by semiglobal

matching and mutual information. Pattern Analy-

sis and Machine Intelligence, IEEE Transactions on,

30(2):328–341.

Howard, I. P. and Rogers, B. J. (1995). Binocular vision

and stereopsis. Oxford University Press.

Jerome, C. and Witmer, B. (2005). The perception and es-

timation of egocentric distance in real and augmented

reality environments. In Proceedings of the Human

Factors and Ergonomics Society Annual Meeting, vol-

ume 49, pages 2249–2252. SAGE Publications.

Kruijff, E., Swan, J., and Feiner, S. (2010). Perceptual is-

sues in augmented reality revisited. In Mixed and Aug-

mented Reality (ISMAR), 2010 9th IEEE International

Symposium on, pages 3–12. IEEE.

Livingston, M. A. (2005). Evaluating human factors in aug-

mented reality systems. Computer Graphics and Ap-

plications, IEEE, 25(6):6–9.

Mei, X., Sun, X., Zhou, M., Jiao, S., Wang, H., and Zhang,

X. (2011). On building an accurate stereo matching

system on graphics hardware. In Computer Vision

Workshops (ICCV Workshops), 2011 IEEE Interna-

tional Conference on, pages 467–474. IEEE.

Pfautz, J. D. (2000). Depth perception in computer graph-

ics. PhD thesis, University of Cambridge.

Reading, R. (1983). Binocular vision: Foundations and ap-

plications. Butterworths.

Scharstein, D. and Szeliski, R. (2002). A taxonomy and

evaluation of dense two-frame stereo correspondence

algorithms. International journal of computer vision,

47(1-3):7–42.

Swan, J. E., Jones, A., Kolstad, E., Livingston, M. A., and

Smallman, H. S. (2007). Egocentric depth judgments

in optical, see-through augmented reality. Visualiza-

tion and Computer Graphics, IEEE Transactions on,

13(3):429–442.

Szeliski, R. (2011). Computer vision: algorithms and ap-

plications. Springer.

Wann, J. P., Rushton, S., and Mon-Williams, M. (1995).

Natural problems for stereoscopic depth perception in

virtual environments. Vision research, 35(19):2731–

2736.

AComprehensiveApproachforEvaluationofStereoCorrespondenceSolutionsinAugmentedReality