DETECTING RECTANGULAR OBJECTS IN URBAN IMAGERY

A Re-Segmentation Approach

Thales Sehn Korting, Luciano Vieira Dutra and Leila Maria Garcia Fonseca

National Institute for Space Research (INPE) – Image Processing Division

Av. dos Astronautas, 1758 – São José dos Campos, Brazil

Keywords:

Re-Segmentation, Graph-Based Segmentation, Remote Sensing, Urban Imagery.

Abstract:

Image segmentation is a broad area, which covers strategies for splitting one input image into its components.

This paper aims to present a re-segmentation approach applied to urban imagery, where the interest elements

(houses roofs) are considered to have a rectangular shape. Our technique ﬁnds and generates rectangular

objects, leaving the remaining objects as background. With an over-segmented image we connect adjacent

objects in a graph structure, known as Region Adjacency Graph – RAG. We then go into the graph, searching

for best cuts that may result in segments more rectangular, in a relaxation-like approach. Graph search consid-

ers information about object class, through a pre-classiﬁcation stage using Self-Organizing Maps algorithm.

Results show that the method was able to ﬁnd rectangular elements, according user-deﬁned parameters, such

as maximum levels of graph searching and minimum degree of rectangularity for interest objects.

1 INTRODUCTION

Image segmentation remains a great challenge in dig-

ital image processing tasks. From segmentation many

other interpretation tasks are performed, which im-

plies a certain responsibility over the segmentation al-

gorithms. Several approaches have been already pro-

posed in the literature, each one covering one spe-

ciﬁc area of interest. A simple deﬁnition was made

by (Haralick and Shapiro, 1985), “a good segmenta-

tion of a image should separate the image into simple

regions with homogeneous behavior”.

Segmentation is a broad area, covering strategies

for splitting one input image into its components, con-

cerning one speciﬁc context. This context also in-

cludes aspects of scale, because the image compo-

nents start with a single pixel, however they can be

merged to generate objects with a meaning. The main

tasks covered by any segmentation are to extract the

image objects and produce good results according a

set of parameters, also being computationally efﬁ-

cient.

Considering personal photographs, the algorithm

can segment each face present in the picture, or ex-

tract the background and stress the objects of interest,

such as cars (Roller et al., 1993; Leibe et al., 2004),

constructions, people (Li et al., 2005; Feris et al.,

2004), etc. In the remote sensing area, which is the

main application of the presented approach, segmen-

tation should generate objects according to the targets

of one satellite image, such as roofs (Chesnel et al.,

2007), streets (He et al., 2004) and trees in an urban

image, for example. In other remote sensing areas, as

agriculture (Pérez et al., 2000), the algorithm should

extract targets such as different crops, or deforested

areas (Silva et al., 2005), to differentiate land uses by

classiﬁcation processes.

This paper considers segmentation applied to ur-

ban imagery, where the interest elements (houses

roofs) are considered to have a rectangular shape in

most of the cases. The implemented algorithm aims

to ﬁnd and generate rectangular objects as foreground,

leaving the rest objects as background. For this, we

ﬁrstly create an over-segmented image and connect

adjacent objects in a graph structure, known as Re-

gion Adjacency Graph – RAG (Schettini, 1993). We

then go into the graph, searching for best cuts that

may result in segments more adequate to our context.

RAG also considers information about object class, in

a pre-classiﬁcation stage that is explained further.

Next Section we discuss general image segmen-

tation and graph-based approaches. In Section 3 we

present the re-segmentation technique, followed by

Results and Discussion in Section 4. In Section 5 we

conclude.

231

Sehn Korting T., Vieira Dutra L. and Garcia Fonseca L. (2009).

DETECTING RECTANGULAR OBJECTS IN URBAN IMAGERY - A Re-Segmentation Approach.

In Proceedings of the Fourth International Conference on Computer Vision Theory and Applications, pages 231-236

DOI: 10.5220/0001806702310236

 SciTePress

2 GRAPH-BASED

SEGMENTATION

The area of image segmentation can be split into two

main classes, namely pixel oriented and object ori-

ented. The ﬁrst one considers each pixel of the image

as one graph node, whereas in the second nodes are

over-segmented objects, with edges on their neigh-

bors, i.e. objects that applies the topological relation

“touch” (Egenhofer and Franzosa, 1991). The nota-

tion G = (V, E) stands for a graph G with a set of

nodes v

, and the set of connections is stored in E

(Felzenszwalb and Huttenlocher, 2004). According

the segmentation class, nodes will be pixels or ob-

jects. According to (Borenstein et al., 2004), image

segmentation with top-down approach is guided by a

stored representation of the shape of objects within a

general class. Furthermore, the so called bottom-up

approach uses image-based criteria to deﬁne coherent

groups of pixels that are likely to belong together (ei-

ther foreground or background objects).

(Zahn, 1971) ﬁrstly proposed the approach of ap-

plying graph cuts into the Minimum Spanning Tree

(MST), generated from the pixel based graph, where

edge weights were based on the differences between

pixel intensities. Graph cuts were applied in edges

with larger weights. How large should be the edges

is a user-deﬁned threshold. However, depending on

the threshold, simply breaking may result in the high

variability region being split into multiple regions.

About urban segmentation, the work from

(Benediktsson et al., 2003) presents one hybrid ap-

proach, through morphological operations applied to

panchromatic images with high spectral and spatial

resolutions. After morphology, a neural network is

applied to classify extracted features from resultant

elements.

According to (Donnay et al., 2001), the urban-

ist and the remote sensing specialist have much to

gain through collaboration on spatial pattern analy-

sis, using texture indices and measures or local het-

erogeneity, as well as morphological transformations

and fractal analysis. However, urban areas are by

their very nature complex. Although a human op-

erator can extract information from images of urban

areas relatively easily, computer-based automated in-

terpretation is a challenging task. (Cinque et al.,

2004) used the re-segmentation approach for image

retrieval, where an user-deﬁned rectangle deﬁned the

interest region. After this, an over-segmentation was

performed into this region and such objects were com-

pared to “coarse” descriptions of image references.

Our approach is another graph-based approach,

however it presents novel methods for ﬁnding rect-

angular objects, present in urban imagery, mainly in

houses roofs. Through a pre-classiﬁcation step, the

method searches over graph nodes for best merging

operations in the interest objects and also with back-

ground neighbors which may improve the resultant

shape. Next Section describes the full process in de-

tail.

3 RE-SEGMENTATION

Our approach is called re-segmentation since it gets

by input a previously over-segmented image, in gen-

eral using traditional methods, such as watershed or

region growing (Duarte et al., 2006; Felzenszwalb

and Huttenlocher, 2004; Tremeau and Colantoni,

2000). Input is composed by the image pixels and

a set of regions, each one connected to its neigh-

bors. Such connections are stored in the graph struc-

ture called RAG, and the distance between nodes,

also called weights, are deﬁned as some difference of

their attributes. The way nodes are joined, or not, is

the main characteristic of every re-segmentation ap-

proach.

3.1 Region Adjacency Graph – RAG

A Region Adjacency Graph is a data structure which

provides spatial view of an image. One way to un-

derstand the RAG structure is to associate a vertex at

each region and an edge at each pair of adjacent re-

gions (Tremeau and Colantoni, 2000). Figure 1 de-

picts an example image and Table 1 shows the graph

weights, in this case using the difference between

spectral means of each region.

Figure 1: RAG example – Image with 7 regions.

We propose a novel merging strategy in the RAG

structure. The regions are merged if they are similar in

respect to their spectral attributes (mean, variance or

texture, for instance) and if the resultant shape (after

merging operation) is rectangular. To carry out this

VISAPP 2009 - International Conference on Computer Vision Theory and Applications

232

Table 1: Graph generated from Figure 1.

1 2 3 4 5 6 7

1 -1 265.8 89.4 265.8 89.4 -1 -1

2 265.8 -1 176.4 -1 176.4 265.8 -1

3 89.3 176.4 -1 176.4 -1 89.4 351.7

4 265.8 -1 176.4 -1 176.4 -1 175.3

5 89.4 176.4 -1 176.4 -1 89.4 351.7

6 -1 265.8 89.4 -1 89.4 -1 441.1

7 -1 -1 351.7 175.3 351.7 441.1 -1

task the regions are divided according their contex-

tual classes. In the case of urban environment, classes

shall be buildings, streets, trees, and so on. Therefore,

regions are classiﬁed and one RAG is built, connect-

ing adjacent regions and storing the information about

their class. Afterwards, the algorithm performs graph

search and merge operations for the interest class,

classiﬁed as foreground, in our case the urban roofs.

The knowledge about the regions class improves the

segmentation accuracy because each class has a spe-

ciﬁc shape rectangularity measure. As already said,

the main purpose of this work is the segmentation of

rectangular objects, as roofs or buildings in urban im-

agery. All other kinds of objects are dealt as back-

ground.

3.2 Graph Pre-Processing

Three main steps are done in the pre-processing stage.

The ﬁrst performs a classiﬁcation on every over-

segmented element. Such classiﬁcation aims to dis-

tinguish elements to be processed as the interest class

(foreground), and the remaining regions, which be-

longs to other classes, like trees, water bodies and

so on (background). All the elements classiﬁed as

background will be used to ﬁt a rectangle in the over-

segmented regions classiﬁed as foreground. This can

be explained by the fact that, for example, a tree may

omit the rectangular shape of a roof, since it can be in

the top of it, as showed in Figure 2. The classiﬁcation

step uses the unsupervised algorithm of Self Organiz-

ing Maps – SOM (Kohonen, 2001), which generates

clusters of regions as output. The resultant classes

are then compared to a reference set of roofs, and the

most similar class is than associated to it.

After the classiﬁcation, redundant information is

removed to decrease time processing in the graph

search. Now the second step of pre-processing is per-

formed. It aims to join regions surrounded only by

elements of its same class, since in the graph search

they would certainly be merged. This means that if a

region has the same class as all its ﬁrst order neigh-

bors, this region is merged to one of them. Figure 3

shows the result of this step.

(a) (b)

Figure 2: A tree in the top of a roof: a) original image and

b) highlighted objects.

(a) (b) (c)

Figure 3: First step of graph pre-processing: a) original im-

age, b) highlighted regions and c) resultant regions.

Finally the algorithm removes possible misclassi-

ﬁcation results. If the considered region belongs to a

different class from its neighbors, its class is changed

to the same as its neighbors, merging it to one of then,

as shown in Figure 4. As in this case as in the pre-

vious, the merge is performed to a randomly chosen

element.

(a) (b)

Figure 4: Second step of graph pre-processing: a) original

image and b) resultant regions.

3.3 Graph Search

At this stage we have a topological description of the

over-segmentation. Now the algorithm may choose

one region of interest (derived from pre-classiﬁcation)

and perform a graph search in a pre-deﬁned level of

neighbors. This level is an user-deﬁned parameter,

since the user shall know the over-segmentation level,

i.e. the amount of regions that may sufﬁce the re-

segmentation of rectangular shapes. This level stands

for the order of connection, considering the graph the-

ory. This means that neighbors of ﬁrst order are the

ones that touch the considered region, neighbors of

second order are the neighbors of these ﬁrst order el-

ements, and so on. We also deﬁne each level as graph

depth. Figure 5 shows one example of a segment

DETECTING RECTANGULAR OBJECTS IN URBAN IMAGERY - A Re-Segmentation Approach

233

and its multiple level neighbors. Figure 6 depicts the

graph for easy understanding.

order

Figure 5: Multiple level neighbors from element #1.

Figure 6: Graph structure generated with Figure 5.

The algorithm, after gathering multiple level

neighbors from one interest region, tries to perform

merging operations with a subset of this group, in or-

der to ﬁnd rectangular objects, that will be classiﬁed

as foreground. Firstly, our approach merges regions

from the same class and then tries to merge regions

from background that should help to ﬁt a rectangular

shape.

An important consideration about background re-

gions must be done. Regions will only be candidate

for merging if they have a smaller area (a certain per-

centage, as 20% or 30% for example) than the con-

sidered region, because regions from background will

often be used to ﬁt the rectangular edges of our in-

terest objects, as the example of Figure 2a. So this is

another user-deﬁned parameter, that aims to avoid bad

attempts from the algorithm, which takes time and de-

crease the resultant accuracy.

3.4 Rectangle Fitting

Our approach for rectangle ﬁtting is based on (Kort-

ing et al., 2008), where the author proposes one shape

attribute Q → [0, 1] called Rectangularity, which is

obtained by the ratio between one object area and

its bounding box area. However, due to rotation this

measure can not correctly represent the object rectan-

gularity, unless a pre-processing step is performed to

transform Rectangularity invariant to rotation.

Given an object and its internal points coordinates,

the eigenvectors are calculated. The ﬁrst eigenvector

shows the object’s main angle. Then a new object is

created by rotating it in relation to this main angle.

Afterward, the unbiased Q is obtained by dividing the

object area and the area of its rotated bounding box.

This value is used for inspecting each alternative for

merging regions. The closer to 1 is Q, the more rect-

angular it is, i.e. an object with Q ≈ 1 is a best candi-

date for re-segmentation.

3.5 Re-Segmentation Summary

To summarize our approach, we show in the Fig-

ure 7 a diagram composed by the main steps of re-

segmentation. It starts from a single image over-

segmented, going through classiﬁcation using SOM,

which divides the regions in two main groups, namely

foreground (interest) and background regions, used to

ﬁt a rectangle of the interest class. After classiﬁca-

tion, the RAG is created and ﬁltered, so that redun-

dant regions are already merged. The last step per-

forms merging of all connected regions with the in-

terest class and calculates their rectangularity, insert-

ing regions from background only if they increase the

overall rectangularity. After this last step, the ﬁnal

region is compared to a threshold of minimum rect-

angularity (user-provided) to be considered, or not, a

re-segmentation result.

4 RESULTS AND DISCUSSION

In this section we discuss some results of our algo-

rithm. In the ﬁrst experiment, we use different depth

levels for graph search in a synthetic image with a ro-

tated rectangle composed by several sub-regions, as

shown in Figure 8a. Figure 8b shows the input over-

segmentation and Figures 8c and 8d displays the re-

sultant re-segmentation with levels of #1 and #2, re-

spectively. We can perceive that in the ﬁrst level the

algorithm isn’t able to reach all regions, however the

resultant segments keep a rectangular shape. With 2

levels, the algorithm is capable of gathering all re-

gions and gives the correct region.

The second result was obtained in a real remote

sensing urban image, where two roofs contain the

rectangular shape and one has an irregular shape, due

to image crop. Figure 9 shows the re-segmentation

results with different levels of graph search. We can

VISAPP 2009 - International Conference on Computer Vision Theory and Applications

234

Figure 7: The re-segmentation algorithm summary. Inputs/Outputs are represented by dashed arrows.

(a) (b) (c) (d)

Figure 9: Urban Re-Segmentation: a) over-segmentation, (b, c, d) re-segmentation with 1, 2 and 3 levels respectively.

(a) (b) (c)

Figure 8: Re-Segmentation of a synthetic image: a) over-

segmentation, b) re-segmentation with 1 level of graph

search and c) re-segmentation with 2 levels.

perceive that due to the huge amount of segments,

just the result with 3 levels could ﬁnd the rectangu-

lar shapes for a good match. However, the result us-

ing 3 levels still presents some mistakes, that can be

ﬁxed using a post-processing stage. This stage, which

is not currently implemented, can perform morpho-

logical operations in the resultant region, and through

one erosion can extract some small edges incorrectly

re-segmented, followed by one dilation, used for best

ﬁtting the rectangle.

5 CONCLUSIONS

One approach for re-segmentation of rectangular

shapes was presented. In this case, it was employed

to urban imagery, detecting roofs, which present an

rectangular aspect. It is important to point out that

such re-segmentation approach can go beyond rectan-

gular shape, just by replacing the step of Rectangle

Fitting (shown in Subsection 3.4) by any other fea-

ture detector, such as circular for example. The re-

sults obtained prove that a segmentation joined with a

classiﬁcation step can increase the accuracy, since due

to simple parameters of traditional segmentation ap-

proaches sometimes the output regions do not present

good visual results. Some mistakes obtained by re-

segmentation can be ﬁxed, as already said, by post-

processing techniques, like morphological operators

in the resultant foreground regions. Through erosion

and dilation the results will get a smoother appear-

ance, removing from it small edges merged by mis-

take.

The algorithm was developed using TerraLib li-

brary, available for free download at http://www.

terralib.org/. Even with the inclusion of ap-

proaches to reduce time processing, such as graph

pre-processing stages, future works include optimiz-

DETECTING RECTANGULAR OBJECTS IN URBAN IMAGERY - A Re-Segmentation Approach

235

ing the whole strategy, with more strategies for fast

data processing.

REFERENCES

Benediktsson, J., Pesaresi, M., and Amason, K. (2003).

Classiﬁcation and feature extraction for remote sens-

ing images from urban areas based on morphologi-

cal transformations. Geoscience and Remote Sensing,

IEEE Transactions on, 41(9 Part 1):1940–1949.

Borenstein, E., Sharon, E., and Ullman, S. (2004). Com-

bining Top-Down and Bottom-Up Segmentation. In

Computer Vision and Pattern Recognition Workshop,

2004 Conference on, pages 46–46.

Chesnel, A.-L., Binet, R., and Wald, L. (2007). Object ori-

ented assessment of damage due to natural disaster us-

ing very high resolution images. Geoscience and Re-

mote Sensing Symposium, 2007. IGARSS 2007. IEEE

International, pages 3736–3739.

Cinque, L., De Rosa, F., Lecca, F., and Levialdi, S. (2004).

Image retrieval using resegmentation driven by query

rectangles. Image and Vision Computing, 22(1):15–

22.

Donnay, J., Barnsley, M., Longley, P., (ESF), E. S. F., and

GISDATA. (2001). Remote Sensing and Urban Anal-

ysis. Taylor & Francis.

Duarte, A., Sánchez, Á., Fernández, F., and Montemayor,

A. (2006). Improving image segmentation quality

through effective region merging using a hierarchi-

cal social metaheuristic. Pattern Recognition Letters,

27(11):1239–1251.

Egenhofer, M. and Franzosa, R. (1991). Point-set topolog-

ical spatial relations. International Journal of Geo-

graphical Information Science, 5(2):161–174.

Felzenszwalb, P. and Huttenlocher, D. (2004). Efﬁcient

Graph-Based Image Segmentation. International

Journal of Computer Vision, 59(2):167–181.

Feris, R., Krueger, V., and Cesar, R. (2004). A wavelet

subspace method for real-time face tracking. Real-

Time Imaging, 10(6):339–350.

Haralick, R. and Shapiro, L. (1985). Image segmentation

techniques. Computer vision, graphics, and image

processing, 29(1):100–132.

He, Y., Wang, H., and Zhang, B. (2004). Color-based road

detection in urban trafﬁc scenes. Intelligent Trans-

portation Systems, IEEE Transactions on, 5(4):309–

318.

Kohonen, T. (2001). Self-Organizing Maps. Springer.

Korting, T. S., Fonseca, L. M. G., Dutra, L. V., and Silva,

F. C. (2008). Image Re-Segmentation – A New Ap-

proach Applied to Urban Imagery. pages 467–472.

Leibe, B., Leonardis, A., and Schiele, B. (2004). Combined

object categorization and segmentation with an im-

plicit shape model. In Workshop on Statistical Learn-

ing in Computer Vision, ECCV, pages 17–32.

Li, S., Jain, A., and service), S. O. (2005). Handbook of

Face Recognition. Springer.

Pérez, A., López, F., Benlloch, J., and Christensen, S.

(2000). Colour and shape analysis techniques for

weed detection in cereal ﬁelds. Computers and Elec-

tronics in Agriculture, 25(3):197–212.

Roller, D., Daniilidis, K., and Nagel, H. (1993). Model-

based object tracking in monocular image sequences

of road trafﬁc scenes. International Journal of Com-

puter Vision, 10(3):257–281.

Schettini, R. (1993). A segmentation algorithm for color

images. Pattern Recognition Letters, 14(6):499–506.

Silva, M., Câmara, G., Souza, R., Valeriano, D., and Es-

cada, M. (2005). Mining Patterns of Change in Re-

mote Sensing Image Databases. In The Fifth IEEE

International Conference on Data Mining, New Or-

leans, Louisiana, USA.

Tremeau, A. and Colantoni, P. (2000). Regions adjacency

graph applied to color image segmentation. Image

Processing, IEEE Transactions on, 9(4):735–744.

Zahn, C. (1971). Graph-Theoretical Methods for Detect-

ing and Describing Gestalt Clusters. Transactions on

Computers, 100(20):68–86.

VISAPP 2009 - International Conference on Computer Vision Theory and Applications

236