Geometry-based Superpixel Segmentation
Introduction of Planar Hypothesis for Superpixel Construction
M.-A. Bauda
1,2
, S. Chambon
1
, P. Gurdjos
1
and V. Charvillat
1
1
VORTEX, University of Toulouse, IRIT-ENSEEIHT, Toulouse, France
2
imajing sas, Ramonville St Agne, France
Keywords:
Image Segmentation, Superpixel, Planar Hypothesis.
Abstract:
Superpixel segmentation is widely used in the preprocessing step of many applications. Most of existing meth-
ods are based on a photometric criterion combined to the position of the pixels. In the same way as the Simple
Linear Iterative Clustering (SLIC) method, based on k-means segmentation, a new algorithm is introduced.
The main contribution lies on the definition of a new distance for the construction of the superpixels. This dis-
tance takes into account both the surface normals and a similarity measure between pixels that are located on
the same planar surface. We show that our approach improves over-segmentation, like SLIC, i.e. the proposed
method is able to segment properly planar surfaces.
1 INTRODUCTION
The image segmentation problem consists in par-
titioning an image into homogeneous regions sup-
ported by groups of pixels. This approach is com-
monly used for image scene understanding (Mori,
2005; Gould et al., 2009). Obtaining a meaningful
semantic segmentation of a complex scene contain-
ing many objects: rigid or deformable, static or mov-
ing, bright or in a shadow is a challenging problem
for many computer vision applications such as au-
tonomous driving, traffic safety or mobile mapping
systems.
Superpixels have been firstly introduce by (Ren
and Malik, 2003). They correspond to an over-
segmentation of the image where each region contains
a part of the same object and respects the edges of this
object (Felzenszwalb and Huttenlocher, 2004). So, it
brings more information than just using pixels. Super-
pixels decomposition also allows to reduce problem
complexity (Arbelaez et al., 2009). Consequently, it
is a useful tool to understand and interpret scenes and
it is widely used over the last decade. Existing su-
perpixels approaches take into account a photomet-
ric criterion, color differences between pixels have to
be minimal in the same superpixel, and a shape con-
straint that is based on the space distance between
pixels. Approaches based only on these two criteria
can provide superpixels that cover two surfaces with
different orientations. On figure 1, there is such a su-
Figure 1: Superpixels comparison between k-means ap-
proach (left) and the proposed approach (right).
perpixel on the edge of the cube which corresponds to
a non-planar area. If a superpixel is not semantically
consistent with the scene geometry, it will be difficult
to label because it represents two different 3D entities.
In order to take into account this kind of difficul-
ties, in single view segmentation methods, geometric
criteria are introduced such as the horizon line or van-
ishing points (Hoiem et al., 2005; Saxena et al., 2008;
Gould et al., 2009). Even if some geometric informa-
tion is introduced, these existing approaches do not
integrate any over-segmentation process. They only
use a post-processing step to classify superpixels. It
means that errors on superpixels, i.e. superpixels that
contain multiple surfaces with different orientations
might be propagated and not corrected.
In the case of calibrated multi-view images, re-
dundant information are available. Consequently, the
geometry of the scene can be exploited to strengthen
the scene understanding. For example, in man-made
environment, it is common to make a piece-wise pla-
nar assumption to guide the 3D reconstruction (Bar-
toli, 2007; Gallup et al., 2010). This kind of infor-
227
Bauda M., Chambon S., Gurdjos P. and Charvillat V..
Geometry-based Superpixel Segmentation - Introduction of Planar Hypothesis for Superpixel Construction.
DOI: 10.5220/0005354902270232
In Proceedings of the 10th International Conference on Computer Vision Theory and Applications (VISAPP-2015), pages 227-232
ISBN: 978-989-758-089-5
Copyright
c
2015 SCITEPRESS (Science and Technology Publications, Lda.)
mation is combined with superpixels in (Mi
ˇ
cu
ˇ
s
´
ık and
Ko
ˇ
seck
´
a, 2010) but, again, the geometric informa-
tion is not integrated in the construction of the in-
termediate entities (superpixel or face mesh) and er-
rors of this over-segmentation are also propagated. As
far as we are concerned, the works of (Weikersdor-
fer et al., 2012; Yang et al., 2013) introduce a geo-
metrical information in the superpixel construction:
a dense depth map. Results quality are encouraging
and in this paper we propose a solution in the case of
a sparse geometric information.
In this article, we focus on the multi-view images
context. In order to obtain superpixels that are coher-
ent with the scene geometry, we propose to integrate
a geometric criterion in superpixels construction. The
proposed algorithm follows the same steps as the well
known SLIC, Simple Linear Iterative Clustering ap-
proach (Achanta et al., 2012) but the aggregation step
takes into account the surface orientations and the
similarity between two consecutive images. In §2, we
present a brief state of the art on superpixels construc-
tors. Then, an overview of the proposed framework
is presented, followed by details about the extraction
of geometric information and its integration in a k-
means superpixels constructor. Finally, experiments
on synthetic data are presented.
2 SUPERPIXELS
In the context of superpixels construction, we pro-
pose to distinguish three kinds of methods: graph-
based approaches (Felzenszwalb and Huttenlocher,
2004; Moore et al., 2008), seed growing meth-
ods (Levinshtein et al., 2009) and methods based on
k-means (Comaniciu and Meer, 2002; Achanta et al.,
2012). We will focus on the last set of methods and
in particular on (Achanta et al., 2012) because this
method provides, in three simple steps presented in
the following paragraph, uniform size and compact
superpixels, widely used in the literature (Wang et al.,
2011). After briefly describing this method, we ana-
lyze its advantages and drawbacks. This allows us to
highlight the significance of the compactness criterion
put forward in (Schick et al., 2012).
K-means Superpixel. SLIC (Achanta et al., 2012)
is based on a 5 dimensional k-means clustering, 3
dimensions for the color in the CIELab color space
and 2 for the spatial features x,y corresponding to the
pixel coordinates. The algorithm follows these three
steps:
1. Seeds initialization on a regular grid of S × S and
distributed on 3 × 3 pixels neighborhood to reach
the lower local gradient;
2. Iterative computation of superpixels on a local
window until convergence:
(a) Aggregate pixels to a seed by minimizing D
SLIC
distance (1) over a search window of size 2S ×
2S;
(b) Update position of cluster centers by calculat-
ing the mean on each superpixel, new centroids
lead to refined seeds;
3. Enforce connectivity by connecting small entities
using connected component method. A superpixel
is connected if all its pixels belong to a unique
connected entity.
Two parameters need to be set for SLIC, the ap-
proximate desired number of superpixels K, as well as
in most of the over-segmentation method, the weight
of the relative importance between spatial proximity
and color similarity m which is directly linked to the
compactness criterion, as shown in equation (1).
Energy Minimisation. The energy-distance to
minimize between a seed and a pixel that belongs to
the window centered on the seed is defined by:
D
SLIC
=
s
d
2
c
+
m
2
S
2
d
2
s
(1)
where
d
c
and d
s
are color and space distance,
m is the compactness weight,
S =
q
N
K
is the size of the local searching window,
N is the number of pixels in the image,
K is the expected number of superpixels.
In the case of a color picture, the distance are de-
fined as following:
d
c
(p
j
, p
i
) =
p
(l
j
l
i
)
2
+ (a
j
a
i
)
2
+ (b
j
b
i
)
2
d
s
(p
j
, p
i
) =
p
(x
j
x
i
)
2
+ (y
j
y
i
)
2
.
(2)
Analysis. The compactness (Levinshtein et al.,
2009; Moore et al., 2008; Achanta et al., 2012;
Schick et al., 2012) of a superpixel can be defined
by the isoperimetric quotient that compares the area
of the superpixel to the area of the circle with the
same perimeter. It means that a superpixel is compact
if it is quite similar to a circle. Figure 2 shows
the influence of the weight on the space distance
d
s
, in k-means algorithm and how it impacts the
compactness. Moreover, the k-means superpixel
VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications
228
algorithm enforces to use pixels in a local window. It
sets the upper value of the compactness to the size of
the searching window.
Figure 2: K-means superpixels compactness comparison
with a small number (50) of desirable superpixels: bottom-
left hard compactness at m = 40 and top-right a soft com-
pactness at m = 5.
Since we have remarked that existing superpixels
methods are usually based on photometric criterion
with some topology properties in the image space, in
the next part, we propose a variant of k-means su-
perpixels constructor on two images. This is done
by integrating the geometric information in order to
obtain superpixels coherent with the scene geometry,
compact even with a small number of representative
entities.
3 GEOMETRY-BASED
SUPERPIXEL CONSTRUCTION
In this work, we deal with two images of an urban
scene i.e., a scene that is basically piecewise pla-
nar. Similarly to (Bartoli, 2007), we assume that we
have at our disposal a sparse 3D reconstruction of the
scene, provided by some structure-from-motion algo-
rithm (Wu, 2011). We aim at segmenting the images
into superpixels using a method relying on a k-means
approach. Our idea is to integrate in the proposed su-
perpixel constructor the available geometric informa-
tion as shown figure in 3.
In this section, we first present the available input
data and describe which information can be extracted
in order to be exploited in the superpixel constructor.
More precisely, we propose to use two maps of the
same size than the input images. For the first map,
the similarity map, the value in each pixel p indicated
if the corresponding 3D points and its neighbourhood
belongs to the same plane. The second map, called
normal map, estimates the normal of this surface for
each p. We also explain how these two maps are used
as quantitative values to modify the SLIC distance.
Figure 3: Framework of our proposed over-segmentation
method using scene geometry. At the top, the two images I
and I
0
. In the second row: the Delaunay triangulation from
2D interest points matched with the other view; the normal
map estimated on the faces of the mesh and the similarity
map between both views. The over-segmentation results is
coherent with the scene geometry.
3.1 Input Data
We use two calibrated color images I and I
0
. We de-
note P
I
= K[I|0] the projection matrix of the reference
image I, where K is the matrix of the intrinsic param-
eters and P
I
0
= K[R|t] the projection matrix associated
to the image I
0
where R is the rotation matrix and t the
translation vector that determines the relative pose of
the cameras. More details about the geometrical as-
pects are provided by (Hartley and Zisserman, 2004).
A sparse 3D point cloud can be projected in each im-
age through the projection matrix to obtain a set of
2D matched points. We note z a part of the reference
image and z
0
the corresponding part in the adjacent
image. Assuming that z and z
0
correspond to a planar
region, we denote ˜z the warped part of the adjacent
image estimated by the homography induced by the
plane of support of the triangle defined by the three
2D points correspondences.
3.2 Geometry Extraction
After a presentation of the available input data, we
introduce how we extract geometric information from
multi-view images in order to exploit it in a k-mean
superpixels constructor.
A given 2D Delaunay triangulation on the set of
2D points of interest in the reference image can be
extracted from the corresponding 3D points. Doing
so, enables to estimate 3D plane on each face of the
mesh determined by three 3D points.
Geometry-basedSuperpixelSegmentation-IntroductionofPlanarHypothesisforSuperpixelConstruction
229
Normal Map. The normal map associated to the
reference image represents for each pixel p the nor-
mal orientation ~n of the plane represented by the face
of the mesh in the image. It is a 3D matrix, containing
the normalised normal coordinates along the 3D axis
in [1,1]. Pixels without normal value are denoted
by .
Planarity Map. For each triangle, knowing the
plane parameters and the epipolar geometry, we can
estimate the homography induced by the plane of
support. This homography enables to compute the
warped image ˜z, aligned to the part of the reference
image. Then, the two images z and ˜z can be compared
using an a full referenced Image Quality Assesse-
ment (IQA) also called photo-consistency criterion,
that measures the similarity or the dissimilarity be-
tween two images. Two kinds of measures take a huge
place in evaluation process results. Those based on
Euclidean distance with the well-known Mean Square
Error (MSE) and the cosine angle distance-based such
as the Structure SImilarity Measure (SSIM) (Wang
et al., 2004). Since dissimilar pixels are rejected
cases, we can use a hard threshold, here zero, to re-
move noise and unmeaning values.
Our previous work (Bauda et al., 2015) shows
that measures based on cosine angle differences are
more efficient than Euclidean based-distances for
planar/non-planar classification. In particular, the
Universal Quality Index (UQI) (Z. Wang and Bovik,
2002), a specific instance of SSIM, shows the best re-
sult and is used in this paper as illustrated in figure 4.
Briefly in our previous work we show that when a
triangle corresponds to a planar surface the similarity
between a reference image triangle and its warped
adjacent image (estimated by the homography in-
duces by the plane of support) is high whereas when
a triangle correspond to a a non-planar surface the
similarity is low. In consequence, we can simply
classify by thresholding pixels that belong to a planar
surface and those do not belong to the planar surface.
Same as the normal map, the missing pixels that do
not belong to the mesh are considered with value.
We have presented the two maps containing the
3D geometric information that we have extracted.
The normal map gives information on the surface ori-
entation since the similarity map indicates if pixel be-
longs to planar surface are erroneous.
3.3 Geometry-based Superpixels
We now propose an new energy to be minimized, de-
fined as following:
Figure 4: Photo-concistency criterion behaviour on a non-
planar case. First row:the reference image triangle z to
which ˜z the warped triangle is compared. A point q
λ
slides
from q
1
to q
2
in order to separate correctly the area in two
planes, allowing to estimate the plane parameters on each
part. Top-right: Curve of the phot-consistency criterion
obtained for each λ. Second row: similarity map for two
cases. Left: λ = 0.02, the two planes are incorrectly sep-
arated so the parameters used to compute the warped im-
age are erroneous and a low similarity value is obtained.
Right: λ = 0.46 the maximum similarity value is reached
and q
λ=0.46
belongs to the two planes intersection.
D
SP
=
q
d
c
0
+ α.d
β
s
0
+ d
g
(3)
A new term d
g
is added in the distance used to
aggregate pixels to a superpixel. This term, takes into
account the scene geometry by merging the surface
normals orientation map and the similarity map:
d
g
(p
j
, p
i
) = 1 d
~n
(p
j
, p
i
).d
UQI
(p
j
). (4)
We also define, d
s
0
and d
c
0
the normalized dis-
tances of d
s
and d
c
. Let d
~n
be the normal distance,
measuring the cosine angle between normals in two
points and d
UQI
corresponds to the value of the simi-
larity map.
d
s
0
(p
j
, p
i
) =
d
s
max(d
s
)
d
c
0
(p
j
, p
i
) =
d
c
max(d
c
)
d
~n
(p
j
, p
i
) =
1 + cos(~n
j
,~n
i
)
2
d
UQI
(p
j
) = UQI(p
j
).1
UQI>0
.
(5)
The behaviour of the three terms d
s
0
, d
c
0
and d
g
of the proposed distance D
SP
presented in equation 3,
are illustrated in figure 5. The normalisation of d
s
and
d
c
enables to be more aware of the impact of weights
α and β on the d
s
0
term is related to the compactness.
The curve illustrated in figure 6, shows the influence
of these two parameters. The α parameter influences
the weight between compactness and the two other
VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications
230
Figure 5: Obtained values for d
s
0
, d
c
0
and d
g
in two particular cases where using only a photometric criterion is not able to
distinguish the two planes. First row: the seed lies on a surface with an unknown geometric distance. Second row: the seed
belongs to a surface knowing its orientation, i.e. planar patch, and it aggregates pixels that lies on a surface with the same
normal orientation.
terms. Bigger α is, more the superpixels are com-
pact. The β parameter gives a relative importance to
the neighbourhood of a given seed which mean that
closer the pixels are to the center more they are taken
into account.
Figure 6: Influence of α and β parameters on the d
s
0
term
related to the compactness.
4 EXPERIMENTATION
For our experiments, we use SP
5D
the state-of-the-
art method corresponding to k-means superpixels ap-
proach where the seed initialisation is made on an oc-
tagonal structure, instead of a regular grid as done in
SLIC, because this shape minimizes the distance be-
tween seeds in a neighbourhood.
Preliminary results on synthetic data with con-
trolled lighting and shape are presented in figure 7.
We quantify the quality of the results with two com-
monly used measures: the boundary recall and the
under-segmentation error. The boundary recall mea-
sures how well the boundary of the over-segmentation
match with the ground-truth boundary. The under-
segmentation error measures how well the set of su-
perpixels describes the ground-truth segments.
Figure 7: Boundary recall and undersegmenation error for
SP
5D
based on SLIC and the proposed approach SP
geom
.
We have remarked that our approach SP
geom
pro-
vides compact and geometrically consistent superpix-
els. For a low number of superpixels, when the input
Geometry-basedSuperpixelSegmentation-IntroductionofPlanarHypothesisforSuperpixelConstruction
231
parameter K is set to 50 and 100 superpixels, SP
geom
performs with a higher recall and a lower underseg-
mentation error than the SP
5D
method. Thanks to the
geometric information, our method exhibits promis-
ing segmentation results (Achanta et al., 2012) .
5 CONCLUSION
In this paper, we have presented a new approach to
generate superpixels on calibrated multi-view images
by introducing a geometric term in the distance in-
volved in the energy minimization step. This geomet-
ric information is a combination of a normal map and
a similarity map. Our approach enables to obtain ge-
ometrically consistent superpixels, i.e. the edges of
the superpixels are coherent with the edges of pla-
nar patches even when planes have similar texture.
The quantitative tests show that the proposed method
obtains a better recall and under-segmentation error
compared to the k-means approach.
In perspective, we have to generalize this work on
real images with meshes that do not respect the edges
of the planar surfaces. In order to go one step further
our next work will include a cutting process of the
non-planar triangles that compose the mesh. We will
also study the influence of the quality of the 3D point
cloud over the segmentation result.
REFERENCES
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., and
Susstrunk, S. (2012). SLIC superpixels compared to
state-of-the-art superpixel methods.
Arbelaez, P., Maire, M., Fowlkes, C., and Malik, J. (2009).
From contours to regions: An empirical evaluation. In
IEEE Computer Vision and Pattern Recognition.
Bartoli, A. (2007). A random sampling strategy for piece-
wise planar scene segmentation. In Computer Vision
and Image Understanding.
Bauda, M.-A., Chambon, S., Gurdgos, P., and Charvil-
lat, V. (2015). Image quality assessment for photo-
consistency evaluation on planar classification in ur-
ban scenes. In International Conference on Pattern
Recognition Applications and Methods.
Comaniciu, D. and Meer, P. (2002). Mean shift: a robust
approach toward feature space analysis. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence,
24(5).
Felzenszwalb, P. and Huttenlocher, D. (2004). Efficient
graph-based image segmentation. In International
Journal of Computer Vision.
Gallup, D., Frahm, J.-M., and Pollefeys, M. (2010). Piece-
wise planar and non-planar stereo for urban scene re-
construction. In IEEE Computer Vision and Pattern
Recognition.
Gould, S., Fulton, R., and Koller, D. (2009). Decomposing
a scene into geometric and semantically consistent re-
gions. In IEEE International Conference on Computer
Vision.
Hartley, R. I. and Zisserman, A. (2004). Multiple View Ge-
ometry in Computer Vision. Cambridge University
Press.
Hoiem, D., Efros, A., and Herbert, M. (2005). Geometric
context from a single image. In IEEE International
Conference on Computer Vision.
Levinshtein, A., Stere, A., Kutulakos, K., Fleet, D., Dick-
inson, S., and Siddiqi, K. (2009). Turbopixels: Fast
superpixels using geometric flows. IEEE Transac-
tions on Pattern Analysis and Machine Intelligence,
31(12):2290–2297.
Mi
ˇ
cu
ˇ
s
´
ık, B. and Ko
ˇ
seck
´
a, J. (2010). Multi-view superpixel
stereo in urban environments. In International Journal
of Computer Vision.
Moore, A., Prince, S., Warrell, J., Mohammed, U., and
Jones, G. (2008). Superpixel lattices. In IEEE Com-
puter Vision and Pattern Recognition.
Mori, G. (2005). Guiding model search using segmenta-
tion. In IEEE International Conference on Computer
Vision.
Ren, X. and Malik, J. (2003). Learning a classification
model for segmentation. In IEEE International Con-
ference on Computer Vision, volume 1, pages 10–17.
Saxena, A., Sun, M., and Ng, A. (2008). Make3d: Depth
perception from a single still image. In IEEE Trans-
actions on Pattern Analysis and Machine Intelligence.
Schick, A., Fischer, M., and Stiefelhagen, R. (2012). Mea-
suring and evaluating the compactness of superpixels.
In International Conference on Pattern Recognition.
Wang, S., Lu, H., Yang, F., and Yang, M. (2011). Super-
pixel tracking. In IEEE International Conference on
Computer Vision.
Wang, Z., Bovik, A., Sheikh, H., and Simoncelli, E. (2004).
Image quality assessment: From error visibility to
structural similarity. In IEEE Transaction on Image
Processing.
Weikersdorfer, D., Gossow, D., and Beetz, M. (2012).
Depth-adaptive superpixels. In 21st International
Conference on Pattern Recognition.
Wu, C. (2011). Visualsfm: A visual structure from motion
system.
Yang, J., Gan, Z., Gui, X., Li, K., and Hou, C. (2013). 3-
D geometry enhanced superpixels for RGB-D data.
In Advances in Multimedia Information Processing-
PCM.
Z. Wang, Z. and Bovik, A. (2002). A universal image qual-
ity index. In IEEE Signal Processing Letters.
VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications
232