Elements of a Gestalt Algebra: Steps Towards
Understanding Images and Scenes
Eckart Michaelsen, Michael Arens and Leo Doktorski
FGAN-FOM, Gutleuthausstrasse 1, 76275 Ettlingen, Germany
Abstract. A mathematical structure is sketched that is meant to capture the
regularities and hierarchies in the structure of images. The approach is moti-
vated by difficulties arising from aerial image analysis of urban terrain. It is not
feasible to list and model all possibilities for things such as buildings that occur
in such data. Emanating from the Gestalt-theory of perception an abstract alge-
bra of operations on image objects is defined and the formal properties are dis-
cussed. It is intended to build a future software system on such formalisms that
will realize only those gestalt models that are evident from the data and can
build and recognize structures of previously unseen and unexpected structure.
1 Introduction
At least since the days of Helmholtz the underlying principles of human perception
are scientifically discussed. Major contributions are now a hundred years old. The
German word Gestalt is the usual term referring to groups of entities arranged in a
salient manner. Salience is meant here as the property of inevitably guiding human
perceptive attention.
Figure 1 shows an example for situations that we intend with this contribution. In an
aerial image of an urban area symmetric gabled roofs are present. A building is
formed of two wings of the same form rotated by π/2. Such building is arranged pair-
wise in a mirror-symmetry. More building groups of this type are arranged in a row.
The row again has a symmetric partner, etc.
How may a machine-vision system look like that can recognize such a pattern and
generate an appropriate description from it? It will be only of practical use, if it does
not code the particular realizations in this picture rigidly. It must be flexible to dis-
cover new arrangements and hierarchies in every example. What kinds of data-
structures are suitable for such task? These are the topics of the proposed rather pre-
liminary contribution of ideas.
Michaelsen E., Arens M. and Doktorski L. (2008).
Elements of a Gestalt Algebra: Steps Towards Understanding Images and Scenes.
In Image Mining Theory and Applications, pages 65-73
DOI: 10.5220/0002339200650073
Copyright
c
SciTePress
2 Related Work
Progress in this field is slow and thus quite old literature is to be considered. Some of
it has been originally published in different languages and is only partially available
in English translations.
2.1 Gestaltism
The classics of Gestaltist literature are [20], [14] and [6]. A common practice in this
branch of psychology is argumentation by drawing dot patterns and demonstrating the
gestalt phenomena by use of the readers own perceptive mechanisms. A lot of work
consists of identifying or inventing illusions that reveal properties of human percep-
tion.
Steps towards incorporating such findings into automatic machine vision systems are
taken e.g. by Lowe [7]. Often this involves a more or less general theory of percep-
tion incorporating machines, animals and human [3]. Probably the most elaborated
work on the mathematical foundation and automation of Gestaltist ideas is [1].
.
.
.
.
.
.
.
.
.
Fig. 1. Example of a rather typical Gestalt hierarchy as they occur in urban remote sensing data.
6666
2.2 Practical Attempts on Remote Sensing Data
The main interest in automatic understanding of previously unseen repetitive or sym-
metric gestalts comes from remote sensing – in particular from aerial image analysis
of urban scenery. Well known sources are the proceedings of the ETH workshops on
building and road recognition in Ascona [4] (1995, 1997 and 2001). Elements of a
syntactic formulation of generic building models for recognition in aerial imagery can
be found e.g. in [2].
Very interesting early work on arrangements and hierarchies of arrangements of ob-
jects in aerial images is presented already 1980 by Nagao and Matsuyama [15]. The
most well known contribution concerning a corresponding production system is the
SIGMA-system [10]. There sophisticated control structures are given that can help
handling the inevitable computational effort problems.
2.3 Algebra
There is an algebraic theory of pattern recognition, image analysis and estimation
Zhuravlev and Gurevich [21,5]. Searching images for regularities of arbitrary form is
identified by Gurevich in [5] as one of the objectives of image analysis for which the
descriptive algebraic approach is meant. However, this is only one of many other
purposes – whereas this contribution focuses on this particular and important task.
The symmetry groups and grid structures used here may well be understood as par-
ticular allowable transforms in the image formation models used in the descriptive
image algebra theory. But, for the purpose of our gestalt operations, such group trans-
forms are always understood locally with respect to a specific location (and orienta-
tion). They only act on few gestalts involved in the construction – not globally. What
is found about reductions to recognizable canonical representatives may well be
transferred and used here.
Less relatedness can be found with the image algebra as defined by Ritter and Wil-
son [16]. Much of that work is related to the pixel grid structure of images, how con-
volution filters and morphological filters can be captured algebraically etc. Our ge-
stalt algebra leaves the level of pixels as soon as possible. In the extreme case a single
pixel is a primitive gestalt object of our structure – the grid is neglected and the whole
image is not treated at all.
2.4 Picture Grammars
There is a long history of syntactic pattern recognition methods. Many kinds of gram-
mars for image analysis were already studied by Rosenfeld [17]. Viewing the objects
as a set and the interrelation between them as constraints and hierarchical construc-
tions is captured by the constrained multi-set grammars of Marriot and Meyer [9].
This gains algebraic structure with Wang’s thesis [19] where e.g. mathematical order
structures are treated on the constraints between the objects. Most of these syntactical
works concentrate on certain diagram understanding tasks – such as electronic cir-
cuits.
6767
2.5 Own Previous Work
Resting on earlier fundaments the blackboard image understanding system from
which many of the ideas of this contribution stem was first published in 1986 [8].
There has been a syntactic foundation as well [12]. The system was used for complex
3D-scene understanding tasks [18]. Later work with the same production system
structure was on building recognition from high resolution SAR-images [11] and on
estimation of geometric entities by good sample consensus [13]. Up to now there has
not been an algebraic fundament to this work. And up to now the symbols and object-
classes had been fixed and pre-defined. The purpose of this contribution is initializing
new work in these directions.
3 Definitions
This technical section defines the gestalt algebra in four steps: First in Section 3.1 the
primitive elements are introduced that form the fundament of the proposed structure.
These primitives are located in a metric space. Then symmetry groups are introduced
in 3.2 working on the associated space – such that the objects can be mapped on each
other. These mappings define a matching assessment for groups of objects. Thus the
fundament is given for the gestalt operations given in 3.3 and the algebra in 3.4.
3.1 Primitive Gestalts
We call a metric space D – such as the 2d-pixel coordinates of an image, gradients of
image edges, 3d-world coordinates of laser measured scene surface points, 2d+t co-
ordinates of a video etc. – a primitive domain. On this we will build our algebraic
structure.
In order to distinguish more than one type of objects we introduce a finite set of
primitive symbols V
p
={
σ
1
,…,
σ
m
}. Each of these has a sub-space D
j
D assigned to it.
Pixels will only have their coordinates and their brightness; edgels will have a gradi-
ent instead of the brightness, etc.
Each object has an assessment value 0<α
1 assigned to it. For a contour primitive
this may be a monotone function of the local gradient magnitude, etc.
An instance g=(
σ
j
, d, α)
V
p
×
D
j
×
(0,1] will be called a primitive gestalt hence-
forth.
3.2 Symmetries on the Domain
A symmetry group is a finite (order m) group G of mappings f such that
:fD D
(1)
6868
is objective and preserves the metric. G contains the identity as neutral element and
an inverse
f
-1
for every element. Examples are mirror mappings or rotations.
Such mappings have a reference frame associated with them – i.e. a position
γ
p
and
an orientation
γ
o
. Let d
0
,…,d
k
be a set of points in D with k<m. Then the minimiza-
tion problem
0
1
min | ( )|
k
ii
i
err d f d
γ
=
=−
(2)
is usually straightforwardly solvable. From such a solution we can obtain a new as-
sessment using
2
01
err
e
ς
α
−⋅
<
=≤
(3)
with a suitable domain-dependent parameter ζ.
We also calculate the distance between the reference position γ
p
and the position d
0
and call it γ
d
. Thus we get for such a set of points and such a group a unique result (α,
γ
p
, γ
o
, γ
d
). We cannot recover the original points from this description – but we can
draw an “ideal representative” in the correct position, orientation and size. And we
have assessed its quality.
3.3 Gestalt Operations
As operations on gestalts
g
i
we allow symmetry operations, grid operations and clus-
ter operations:
A symmetry operation has the following form:
()
1
(,,),( , , ),
k
Gi pod
i
gh Gk
σ
γγγ α
=
⊗==
(4)
where
G is a finite symmetry group operating on D,
γ∈
G minimizes the metric dis-
tances as outlined in Section 3.2. As an example we take mirror symmetries where
k=2 and we can write
()
1211 22
(,) ( , )
( , ,2),( , , ,),
Mir Mir
pod
gg d d
hMir
σ
σ
σγγγα
⊗= =
==
(5)
such that
γ
gives the optimal symmetry axis minimizing the metric distance between
gestalt
g
1
and its mirrored partner gestalt g
2
. It also gives the central position on the
axis and sets it on the first attribute position, where all objects have their reference
location. Other possibilities are rotation groups etc.
A grid operation has following two forms:
6969
1211 12
12
312
(,) (, )
( , ,2),( , ),
2
Grid Grid
gg d d
dd
gGrid dd
σ
σ
σ
α
⊗= =
+
⎛⎞
=−
⎜⎟
⎝⎠
(6)
is initially grouping a pair of similar gestalts and attributing it with the corresponding
centre of gravity and translational vector respectively. This vector can be coded as an
orientation and a length. The assessment is based on the similarity of the pair of ge-
stalts. The second form is
112
12 2
(( , , ), , ) ( , )
(,,1),( , ),
11
Grid k
Grid k d t d
kd d kt
Grid k
kk
σ
σ
δ
σ
α
+
=
++
⎛⎞
=+
⎜⎟
++
⎝⎠
(7)
for recursively appending further elements to a grid with k similar gestalts so that they
form a grid with k+1 members. The centre of gravity is than shifted accordingly and
δ
2
is the estimation for the translation vector from the last gestalt of the row to the
appended gestalt. The assessment is based on the similarity of the translations
t-
δ
2
.
Note that in operation (7) the nature of the attribute domain of the first element and of
the resulting element is the same.
We would see the operations indicated in (6) and (7) only as the simplest case (1-
dimensional) of more elaborated (n-dimensional) grids known in algebraic geometry.
A cluster operation has the following form:
1
1
...
(,,),( ),
k
k
Ci
i
dd
gh Ck
k
σ
α
=
++
⎛⎞
⊗==
⎜⎟
⎝⎠
(8)
where all the gestalts
g
j
are just merged into one. This is the simplest form. There are
no further dimensions added to the attribute domain. The assessment
α is based on the
similarity and closeness of the gestalts that participate.
3.4 Closure
The closure of the operations indicated in Section 3.3 given the set of primitive ge-
stalts as sketched in Section 3.1 will be called a gestalt algebra. An element of such
algebra codes a set of primitive objects and a chain of operations on them that explain
their arrangement in the domain.
There is a problem with this: For closing the algebra the operations must be defined
for combinations of any elements of it.
The centre-of-gravity assignment and the metric must be defined for different
primitive gestalts. One solution here may be taking into account the most
primitive common part only. E.g. for a primitive line segment and a primitive
spot element take the distance of their centres. However, this would not meet
the intention. A better solution in this case is taking this distance and adding
7070
the maximal distance possible between the orientations of line segments. Then,
a line and a spot can never have zero distance – which is intended – but their
locations still matter.
We have seen that e.g. with every symmetry- operation the attribute domain is
gaining further dimensions. A metric is required that is also defined between
elements of spaces of different dimension and topology as they occur in the
definitions of Section 3.3. Such metric should incorporate properly the dis-
similarity between gestalts of different type and level of abstraction.
Another possibility would be to include a special symbol “clutter” – a kind of
null. If objects are too dissimilar to give a proper metric distance their combi-
nation in an operation will result in such an object which has infinite distance
from anything else and can produce nothing else but again clutter.
Because we are aware of this technical imperfection we would not call this contri-
bution “The Gestalt algebra” but rather present it as elements of a definition of such
structure. It is not yet complete.
It is however already quite clear that many nice and desirable properties such as
associativity are not given here. Commutativity holds for certain symmetry opera-
tions. It remains to be investigated what kind of distributivity laws hold when the
different operations are mixed in compound structures.
4 Discussion
Why do we use algebraic terms and notations for the indicated purpose of describing
hierarchically constructed pattern structures? Alternatively one could also use syntac-
tic definitions such as graph-grammars or constrained multi-set grammars. In fact this
has also been done as indicated in Section 2.
Our hope is that with the help of the well developed algebraic apparatus – such as
the hierarchies of group theory – in particular ambiguous gestalts such as the one
presented in Figure 2 can be handled more appropriately. This gestalt can be com-
posed at least in the following different ways:
1) As a mirror symmetric pair of complete rotations of order four of square ge-
stalts. 2) As a mirror symmetry of two rows of four square each where also
the spacing is similar. 3) As mirror symmetry of mirror symmetry gestalts.
Fig. 2. Example of an ambiguous gestalt.
7171
If it is so which of these should really be taken as different gestalts – and which
should really be identified as the same element of our algebra? If such identification
makes sense then what is the canonical representative of such an equivalence class?
We intend to construct our formalisms such that structure, geometric attributes and
hierarchy of previously unseen objects become explicit in the automatically extracted
instances. It is most important that it can identify automatically different but equiva-
lent descriptions of the same object. We think that algebra is a good candidate to
work on these practically very important issues.
5 Conclusions
We have not shown any results on particular remote sensing applications in this con-
tribution. Also there seem to be some details and proofs missing - e.g. for the unique-
ness of the solutions of the minimization associated with each step. Both goals need
to be addressed in future: 1) Lay the theoretical fundament for the Gestalt algebra by
means of definitions and possibly theorems; 2) code elements of this structure and
test them on relevant recognition scenarios.
References
1. Desolneux, A.: Evénements significatifs et applications à l'analyse d'images. PhD thesis,
http://www.math-info.univ-paris5.fr/~desolneux/papers/these2.pdf (2000)
2. Fuchs, F.: Building Reconstruction in Urban Environment: A Graph-based Approach. In:
Baltsavias, E. P., Gruen, A., Van Gool, L. (eds.): Automatic Extraction of Man-Made Ob-
jects from Aerial and Space Images III. Birkhäuser Verlag, Basel (2001) 205-215
3. Guo, C.-E., Zhu, S.C., Wu, Y. N.: Modelling Visual Patterns by Integrating Descriptive
and Generative Methods, IJCV, 53 (1), (2003) 5-29
4. Gruen, A., Kuebler, O., Agouris, P. (eds.): Automatic Extraction of Man-Made Objects
from Aerial and Space Images. Birkhäuser Verlag, Basel (1995)
5. Gurevich, I. B.: Image Mining via Descriptive Approach. I. General Methodology and
Basic Instruments. OGRW-7-2007. To appear in Pattern Recognition and Image Analysis
(2008)
6. Kanisza, G.: Grammatica del Vedere. Il Mulino, Bologna (1980)
7. Lowe, D.: Perceptual Organization and Visual Recognition, Kluwer Academic Publishers,
Boston (1985)
8. Lütjen, K.: Ein Blackboard-basiertes Produktionssystem für die automatische
Bildauswertung. In: Hartmann, G. (ed.): Mustererkennung 1986, 8. DAGM-Symposium,
Informatik-Fachberichte 125, Springer, Berlin (1986) 164-168
9. Marroitt, K., Meyer, B. (eds.): Visual Language Theory. Springer--Verlag, Berlin (1998)
10. Matsuyama, T., Hwang, V. S.-S.: Sigma a Knowledge-based Image Understanding System.
Plenum Press, New York (1990)
11. Michaelsen E., Soergel U., Thoennessen U.: Perceptual Grouping in Automatic Detection
of Man-Made Structure in high resolution SAR data. Pattern Recognition Letters.27 (4),
(2006) 218-225
7272
12. Michaelsen. E.: Über Koordinaten Grammatiken zur Bildverarbeitung und Szenenanalyse.
Phd.. Thesis, University of Erlangen-Nürnberg, Chair of Pattern Recognition,
http://www.exemichaelsen.de/Michaelsen_Diss.pdf (1998)
13. Michaelsen, E., von Hansen, W., Kirchhof, M., Meidow, J., Stilla, U.: Estimating the
Essential Matrix: GOODSAC versus RANSAC. ISPRS Symposium on Photogrammetric
Computer Vision (PCV 2006), proceedings on CD, (2006)
14. Metzger, W.: Gesetze des Sehens. Waldemar Kramer, Frankfurt (1975)
15. Nagao, M., Matsuyama T.: A Structural Analysis of Complex Aerial Photographs, Plenum
Press. New York (1980)
16. Ritter, G. X., Wilson, J. N.: Handbook of Computer Vision Algorithms in Image Algebra.
CRC Press, New York (1996)
17. Rosenfeld, A.: Picture Languages. Academic Press, New York (1979)
18. Stilla U., Michaelsen E.: Semantic modelling of man-made objects by production nets. In:
Gruen A, Baltsavias EP, Henricsson O (eds) Automatic extraction of man-made objects
fromaerial and space images (II). Birkhäuser Verlag, Basel (1997) 43-52
19. Wang D.: Studies on the Formal Semantics of Pictures. PhD. Thesis, University of Amster-
dam, ILLC Dissertation Series, Amsterdam (1995)
20. Wertheimer, M.: Untersuchungen zur Lehre der Gestalt, II. Psychologische Forschung, 4
(1923) 301-350
21. Zhuravlev, Yu. I.: An Algebraic Approach to Recognition or Classification Problems.
Pattern Recognition and Image Analysis, 8(1) (1998) 59–100
7373