COREST: A MEASURE OF COLOR AND SPACE STABILITY TO

DETECT SALIENT REGIONS ACCORDING TO HUMAN CRITERIA

Agn´es Borr`as and Josep Llad´os

Computer Vision Center, Dept. Ci`encies de la Comunicaci´o, Spain

Keywords:

Region of interest, Region detector, Color stability, Space scale.

Abstract:

In this paper we present a novel method to obtain regions of interest in color images. The strategy consists

in the evaluation of the stability of a region according to its properties of color and spatial arrangement. We

propose a fusion of the classical color image segmentation with the space scale analysis. An image can be

decomposed in a set of regions that describe the whole image content. Using a set of manual labelled images

we have evaluated the properties of the detector according to the human perception. The proposed region

detector has a potential application in the ﬁeld of the content based image retrieval by sketch.

1 INTRODUCTION

Many Computer Vision applications use region detec-

tion procedures as a ﬁrst step to extract relevant infor-

mation from images. Sometimes this relevant infor-

mation needs to ﬁt the human perception of the image

content. This constraint is usually required in the con-

tent based image retrieval ﬁeld (CBIR). Several com-

mercial systems incorporate drawing interfaces to al-

low users to create their own queries (Veltkamp and

Tanase, 2000). Sketch based search engines are very

attractive because overcome language limitations and

allow to adapt the queries to the user needs. But, can

these queries ﬁt the preprocessed information of the

database images?

CBIR systems use to rely on a previous segmen-

tation of the images. The goal of segmentation is

to group together similar pixels in order to separate

them into regions. Nevertheless, some experiments

like the Berkeley segmentation data set demonstrate

that does not exists a unique way in which humans se-

lect the relevant information of an image. The Berke-

ley database collects the hand labelled segmentations

from several human subjects (Martin et al., 2001).

In the Figure 1 we show an example where we can

observe that different users describe the same image

with a very different set of regions. This high vari-

ability suggests that CBIR systems could beneﬁt of

region detectors that go beyond the classical segmen-

tation algorithms. Classical segmentation procedures

provide a unique and disjoint decomposition of the

image into a set of regions (Cheng et al., 2001). How-

ever, the set of regions that humans perceive as mean-

ingful can be even overlapped or self-contained.

Figure 1: Examples of the Berkeley data set. Manual seg-

mentations provided by different users on the same image.

In the last years some works to detect regions

of interest have incorporated the multiscale analysis

of the image content. Lindeberg observed that hu-

mans identify real-world objects depending on the

scale of observation (Lindeberg, 1993). He illustrates

this phenomena with the example of the a tree: at

204

Borràs A. and Lladós J. (2009).

COREST: A MEASURE OF COLOR AND SPACE STABILITY TO DETECT SALIENT REGIONS ACCORDING TO HUMAN CRITERIA.

In Proceedings of the Fourth International Conference on Computer Vision Theory and Applications, pages 204-209

DOI: 10.5220/0001802502040209

 SciTePress

close scales human identify the leaves as objects but

the concept of the forest only has sense from large

distances. The main idea of the multiscale analysis

consists in selection local image structures at speciﬁc

scales. The speciﬁc scales are those for which a given

function attains an extremum. Techniques such as

the Lapliacian of Gaussian (LoG) or the determinant

of Hessian (DoH) have obtained excellent results in

matching applications where two images contain in-

stances of the same real object (Mikolajczyk et al.,

2005).

In this paper we propose a system to detect re-

gions of interest according to the human perception.

The main strategy of our proposal is to mix the classi-

cal color segmentation with the scale space analysis.

On the one hand we make a region detection from the

homogeneity properties of the color pixels. On the

other hand we propose a perceptual grouping of the

pixels being or not connected according to their spa-

tial arrangement.

The paper organization is as follows. In the sec-

tion 2 we explain the main idea and the implementa-

tion of our proposal. Then, in section 3, we present

some results involving a set of human based synthetic

images. Finally, the section 4 exposes the conclusions

and future work.

2 THE COLOR REGION

STABILITY (CoReSt)

We have developed a region detector inspired in the

work of Matas that deﬁnes the concept of maximally

stable extremal regions (MSER) (Matas et al., 2002).

Matas informally explained this concept as follows.

Imagine all possible thresholdings of a gray-level im-

age. Then, we can see the thresholded images as a

movie where we start with the lowest threshold and

we gradually increase it. Thus, the ﬁrst image is

white and subsequently black spots begin to appear

and grow. This spots correspond to local intensity

minima, so they continue merging until reaching a

whole black image in the last movie frame. To evalu-

ate this evolution, a stability function between regions

of consecutive frames is deﬁned. This function con-

sists in the rate of change of the region areas. This

way, threshold levels that are local minima of this rate

change are selected to produce the maximally stable

extremal regions. The MSER has been exported to

color images by Forss´en (Forss´en, 2007) applying the

same idea of stability by looking at successive time-

steps of an agglomerative clustering of image pixels.

Following the movie example, imagine that we have

a color image instead of a grayscale one. Then, we

have a threshold related to the color distance of two

pixels. We begin with an image where every pixel is

an independent region. As we increase the threshold

the connected pixels which distance is lower than this

value begin to fuse. Finally, all the pixels form a sin-

gle region in the last frame. This way, the more distant

the color of a region is respect from the color of the

surrounding pixels, the more the stable the region is.

The color analysis takes into account the struc-

tures formed by connected pixels but does not ana-

lyze the emerging structures according to the scale of

observation. Going back to the human perception, it

seems a natural process to group similar color regions

if they are close enough. The space scale analysis

is the tool that allows to perform this kind of region

association. The process takes an important role to

identify objects in real images when they suffer from

partial occlusions or they present a textured patterns.

This way, we propose to export the stability measure

to the spatial arrangement of the pixels. To illustrate

the spatial stability, imagine that we have an image

with a set of segmented regions. Then we also have a

threshold related to the spatial distance of the region

centroids. Our movie starts with this set of regions

being independent and, as we increase the threshold,

the regions are progressively joined. In the last frame

they are all joined in a single one. Then, the more

isolated a region is, the more stable the region is.

Finally, we quantify the saliency of a region mix-

ing the stability measures of color and space. In the

next section we present a region detector based in this

measure that we have called CoReSt (Color Region

Stability).

2.1 The Region Detector

Implementation

From an implementation viewpoint we propose to de-

tect regions of interest using the mean shift algorithm

(Comaniciu and Meer, 1999). A pixel in understood

like a point in a 5D space where its ﬁrst three dimen-

sions are related to the color values in the Luv space

the other two represent the (x,y) coordinates of the

normalized image size. We use an implementation of

the mean shift clustering that depends on two thresh-

olds, hc and hs, that control respectively the similarity

constraints on the color and the space (Christoudias

et al., 2002). We analyze the segmentation stabil-

ity according to the variation of these two thresholds.

To illustrate the process we can construct a bidimen-

sional grid ﬁlled up with the clustered images (see

Figure 2). Let us name MSS the mean shift function

and HC and HS the two sets of thresholds we evaluate

related to the color and space respectively. Further-

COREST: A MEASURE OF COLOR AND SPACE STABILITY TO DETECT SALIENT REGIONS ACCORDING TO

HUMAN CRITERIA

205

more, we deﬁne NC and NS the amount of parameters

that each set contains.

HC(x) = {hc} x = 1..NC

HS(y) = {hs} y = 1..NS

Then, the grid that contains the segmented regions

is denoted G and has dimension [NSxNC].

(x,y)

= MSS(I, HC(x),HS(y))

For every region of image in the grid, we can

ﬁnd the analogous regions along the two dimensions.

Using the movie example, the analogous regions are

those that maximize the overlapping area when we in-

crement and decrement the HC and HS thresholds.

Let us deﬁne the OAR the function that computes the

rate of overlapping area between two regions.

OAR(R

) =

Area(R

∩ R

)

max(Area(R

),Area(R

))

We denote R

(x,y)

the i region of the grid cell (x,y)

and AR

(x,y)

′

(x,y)

its analogous region of another cell (x,y)

′

(x,y)

′

(x,y)

= R

(x,y)

′

| max OAR(R

(x,y)

′

)

∀k = 1..#R

(x,y)

′

Once we have the segmentation evolution along

the color and the space dimensions, we need a func-

tion to evaluate the stability of the regions. Matas

(Matas et al., 2002) used the relative area between

the regions, but notice that we also work in the space

domain considering strong changes when the regions

from disconnected components. This way, we add

the values of the second central moments to provide a

more robust measure of stability. These moments can

be understood as the lengths of the axis of the ellipse

that encloses a region. The similarity rate between

two length values L

and L

is computed using the

function LR.

LR(L

) = min(L

)/max(L

)

We name mAL and MAL to the lengths of the mi-

nor axis and major axis of a ellipse. Given two regions

and R

we deﬁne the ratio of the axis lengths as:

ALR(R

) = max(LR(mAL

,mAL

LR(MAL

,MAL

))

Then, given a region R

(x,y)

and another analogous

one AR

(x,y)

′

(x,y)

we compute the stability S as the mean of

the area rate and the axis rate.

S(R

(x,y)

,AR

(x,y)

′

(x,y)

) = OAR(R

(x,y)

,AR

(x,y)

′

(x,y)

) ∗ 0.5+

ALR(R

(x,y)

,AR

(x,y)

′

(x,y)

) ∗ 0.5

Treating separately the analogous regions of the

two dimensions, we name SC to the function that mea-

sures the stability along the color, and SS its equiva-

lent in the space.

SC(R

(x,y)

) =

∑

X=1

S(R

(x,y)

,AR

(X,y)

(x,y)

)

SS(R

(x,y)

) =

∑

Y=1

S(R

(x,y)

,AR

(x,Y)

(x,y)

)

At this point we have two measures of stability in

the range [0,1] for all the regions of the images in the

grid. Next we select those that describe best an image.

2.1.1 Maximum Selection Response

Among the whole set of segmented regions we select

those that present a maximum response of the stability

measure. Let us name PC and PS the functions that

evaluates the presence of a peak in the response of the

functions SC and SS. These regions are called ROIS.

ROI

(x,y)

= R

(x,y)

| PC(R

(x,y)

) or PS(R

(x,y)

)

PC(R

(x,y)

) = SC(AR

(x−1,y)

(x,y)

) ≤ ...

SC(R

(x,y)

) > SC(AR

(x+1,y)

(x,y)

)

PS(R

(x,y)

) = SS(AR

(x,y−1)

(x,y)

) ≤ ...

SS(R

(x,y)

) > SS(AR

(x,y+1)

(x,y)

)

A global measure of saliency GS can also be com-

puted for each each selected ROI.

GS(ROI

(x,y)

) =

SS(ROI

(x,y)

) + SC(ROI

(x,y)

)

This value combines the stability of color and

space allowing to rank the regions by its meaningful-

ness. This can be useful to weight the matching of the

ROIS in a retrieval process.

VISAPP 2009 - International Conference on Computer Vision Theory and Applications

206

Figure 2: Grid of segmented images using the MSS ac-

cording to the parameters of color HC and space HS. We

show an example of a region and their analogous regions.

From them we compute its stability measure. Observe how

it grows trough the scale, fusing with regions of the same

color, and how it grows trough color fusing with similar

colored pixels.

3 EXPERIMENTS AND RESULTS

We have made a set of experiments to study the per-

formance of the stability measure according to the hu-

man perception. As a ground truth of our test, we

have used 100 images of the Berkeley database with

their corresponding manually labelled regions (Mar-

tin et al., 2001). We have compared the performance

of the proposed region detector (COREST) with its

two most related algorithms: the Mean Shift Segmen-

tation (MSS) and the Maximal Color Stable Regions

(MCSR).

There exists many strategies to quantify the agree-

ment of a segmented region with a ground truth refer-

ence (Unnikrishnan et al., 2007). Nevertheless, these

strategies do not match our region detector output

since one pixel of an image can be assigned to more

than one region of interest. This way, to provide

a numerical evaluation we have used the repeatabil-

ity measure proposed by Mikolajczyk (Mikolajczyk

et al., 2005). The repeatability computes the percent-

age of manual regions that can be matched with any

of the regions that the automatic detector provides.

To simplify the matching step we have approximated

the regions with ellipses according to their moments.

Then, two regions are considered to match if its over-

lapping area is greater than a given percentage. To il-

lustrate the overlapping rate we show some examples

on the Figure 3 extracted from (Mikolajczyk et al.,

2005).

Figure 3: Examples of the overlapping error rate between

two ellipses. From top left to bottom right: 10 to 60%.

We have tested the repeatability of the three de-

tectors (COREST, MSS and MCSR) with the over-

lapping error threshold varying from 10 to 60%. The

graph of the Figure 4 presents the mean of the re-

peatability for the whole set of test images. We have

tested the algorithms with the default parameters of

the available source code (Christoudias et al., 2002)

(Forss´en, 2007). Thus, every detector can provide a

different amount of ROIS in the same image. The

mean number of manual segmented regions per im-

age is 26 and the mean number of ROIS per image

detected with the COREST, MSS and MCSR are of

275, 388 and 990 respectively. We observe that the

repeatability of the test with the COREST method is

higher than the other two through all the overlapping

thresholds. We can deduce that even though it de-

tects a lower amount of regions per image, the stabil-

ity criteria allows to match a higher percentage of the

ground truth regions. This way, relaxing the overlap-

ping parameter to a 60% it reaches a repeatability of

68.6%, despite the complexity of the natural scenes

and the subjectivity of the human based ground truth.

Figure 4: Mean repeatability for the test set.

The Figure 5 presents the COREST detector re-

COREST: A MEASURE OF COLOR AND SPACE STABILITY TO DETECT SALIENT REGIONS ACCORDING TO

HUMAN CRITERIA

207

sults with some examples. For every example we

show the ROIS of the real images that ﬁt best the man-

ual ones. In other words, we show the detected ROIs

that maximize the overlapping area of the ground

truth regions.

The COREST detector has selected the image re-

gions according to the stability measures in the color

and in the space features. Then, the contrasted re-

gions and and isolated regions are detected as mean-

ingful according to the human perception. As we have

introduced in the section 2 the analysis of the space

properties plays an important role in identifying ob-

jects in real images. Then, it makes possible to group

a set of disjoint regions that belong to an object that is

partially occluded. This phenomena can be illustrated

with the image a) of the Figure 5. Thus, if we look

at the palm picture we realize that the sand area is oc-

cluded by a shadow that the user has omitted. Being

the shadow so contrasted with the color of the sand, a

classical segmentation could never join the two parts.

The same occurs in the detection of the background

region in the ﬁreman scene f). The region belonging

to the wall is occluded by a picture and it is broken

into a set of parts. The space scale analysis has also

effect in the presence of textures. Then, the system

is able to group similar color regions. Natural scenes

have plenty of textures such as trees a), clouds h),

waves e), or the leopard skin d). Moreover, images

that contain man made objects can also present repet-

itive patterns that a human perceive as belonging to

the same object. This is the case of the white and red

striped bars of the ﬁgure c) or clothing garments of

the image b).

4 CONCLUSIONS

We have presented a novel region detector on color

images that combines a classical color segmentation

approach with a space scale analysis. We have used

the mean shift algorithm to measure the stability of

the regions on the color and the space domains. The

detector gives a very high degree of freedom about

the shape of the output regions making it suitable to

describe any image content. Moreover, the multiscale

approach allows the system to detect ROIS composed

by disjoint regions that can come from partial over-

lapped elements or textured areas. We have made

some experiments to evaluate the regions of interest of

manual labelled image vs. the regions of interest of a

real scene. Using a human based benchmark we have

demonstrated that exist enough correlation to use this

region detector in applications where the information

has to be matched according to the human represen-

tation. One of the potentially applications could be

found int the content based retrieval systems that al-

low sketch based queries. dfafd

ACKNOWLEDGEMENTS

This work has been partially supported by the project

TIC2003-09291 and the grant 2002FI-00724.

REFERENCES

Cheng, H.-D., Jiang, X.-H., Sun, Y., and Wang, J. (2001).

Color image segmentation: advances and prospects.

Pattern Recognition, 12(34):2259–2281.

Christoudias, C., Georgescu, B., and Meer, P. (2002). Syn-

ergism in low level vision. pages IV: 150–155.

Comaniciu, D. and Meer, P. (1999). Mean Shift Analysis

and Applications. In Proceedings of the IEEE ICCV,

pages 1197–1203, Kerkyra, Greece.

Forss´en, P.-E. (2007). Maximally stable colour regions for

recognition and matching. In IEEE Conference on

Computer Vision and Pattern Recognition, Minneapo-

lis, USA. IEEE Computer Society, IEEE.

Lindeberg, T. (1993). Scale-Space Theory in Computer

Vision (The International Series in Engineering and

Computer Science). Springer.

Martin, D., Fowlkes, C., Tal, D., and Malik, J. (2001).

A database of human segmented natural images and

its application to evaluating segmentation algorithms

and measuring ecological statistics. Technical report,

EECS Department, University of California, Berkeley.

Matas, J., Chum, O., Martin, U., and Pajdla, T. (2002). Ro-

bust wide baseline stereo from maximally stable ex-

tremal regions. In Proceedings of the BMVC, vol-

ume 1, pages 384–393, London.

Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman,

A., Matas, J., Schaffalitzky, F., Kadir, T., and Gool,

L. V. (2005). A comparison of afﬁne region detectors.

IJCV, 65(1/2):43–72.

Unnikrishnan, R., Pantofaru, C., and Hebert, M. (2007). To-

ward objective evaluation of image segmentation al-

gorithms. 29(6):929–944.

Veltkamp, R. and Tanase, M. (2000). Content-based image

retrieval systems: A survey. Technical Report UU-CS-

2000-34, Department of Information and Computing

Sciences, Utrecht University.

VISAPP 2009 - International Conference on Computer Vision Theory and Applications

208

a) b)

c) d) e)

f) g) h)

Figure 5: Some examples of the Berkeley data set. We show the ROIS of the manual segmentations and the ROIS of the real

images that maximize their overlapping rate. We highlight the obtained ROIS with the ellipses that approximate their area.

COREST: A MEASURE OF COLOR AND SPACE STABILITY TO DETECT SALIENT REGIONS ACCORDING TO

HUMAN CRITERIA

209