An Extension of the Radial Line Model to Predict Spatial Relations

Logan Servant, Camille Kurtz and Laurent Wendling

LIPADE, Universit

e Paris Cit

e, France

{ﬁrstname.lastname}@u-paris.fr

Keywords:

Spatial Relations, Reference Point, Radial Line Model, Image Understanding, Dataset Denoising.

Abstract:

Analysing the spatial organization of objects in images is fundamental to increasing both the understanding

of a scene and the explicability of perceived similarity between images. In this article, we propose to describe

the spatial positioning of objects by an extension of the original Radial Line Model to any pair of objects

present in an image, by deﬁning a reference point from the convex hulls and not the enclosing rectangles, as

done in the initial version of this descriptor. The recognition of spatial conﬁgurations is then considered as a

classiﬁcation task where the achieved descriptors can be embedded in a neural learning mechanism to predict

from object pairs their directional spatial relationships. An experimental study, carried out on different image

datasets, highlights the interest of this approach and also shows that such a representation makes it possible

to automatically correct or denoise datasets whose construction has been rendered ambiguous by the human

evaluation of 2D/3D views. Source code: https://github.com/Logan-wilson/extendedRLM.

1 INTRODUCTION

The description of a scene or its components is fun-

damental for its understanding. This often requires

a recognition of the different objects or regions that

constitute it but also of their spatial arrangement. Nu-

merous studies have been carried out for the modeling

of spatial relations between objects, in various ﬁelds

of application of pattern recognition and computer vi-

sion. The ﬁrst notable formalization of spatial rela-

tions was proposed by (Freeman, 1975) in the form of

13 spatial relations divided into 3 categories (direc-

tional, topological and geometric). (Egenhofer and

Franzosa, 1991) proposed an other formalism than

Freeman’s elementary spatial relations to describe the

topological relations between objects, called the 9-

intersection model, depending on their position in

space. This work, which is the basis of the topo-

logical relations RCC8 (for Region Connection Cal-

culus), was then extended to distinguish more topo-

logical conﬁgurations (Cohn et al., 1997) in order to

consider spatial conﬁgurations where the objects can

be concave and where the intersection of their convex

hull is not empty (e.g. RCC23). Early work showed

that standard mathematical ”all or nothing” relations

are clearly not sufﬁcient to describe spatial relations,

and (Freeman, 1975) suggested to use fuzzy relations

to reﬁne their evaluation. Many works still assimi-

late 2D objects with very elementary entities such as

a point (centroid) or a bounding rectangle. This pro-

cedure is convenient and useful in most cases, but one

cannot expect satisfactory modeling (Rosenfeld and

Klette, 1985).

Quantitative spatial relationships have been

widely studied to model the gradation of the relation-

ship between object pairs according to two dual con-

cepts. (1) The evaluation of the spatial relationship

for two objects is based on fuzzy modelling directly

from the image space (i.e. a fuzzy landscape), using

morphological operations (Bloch, 1999); (2) A typi-

cal relative position descriptor is the force histogram

(Matsakis and Wendling, 1999), a generalization of

the angle histogram. It has the advantage of being

isotropic and less sensitive to noise, while allowing

the distance between objects to be taken into account

explicitly depending on the type of application. More

recent work has introduced the φ-descriptor (Matsakis

and Naeem, 2016) based on Allen’s time intervals,

providing a generic framework to evaluate usual spa-

tial relations. Finally, other approaches have also been

proposed to model more speciﬁc spatial relationships

such as ”surrounded by” (Vanegas et al., 2011) or ”en-

laced by” (Cl

ement et al., 2017). Although these fam-

ilies of relationships provide an accurate assessment

of spatial relationships between objects in a very large

number of cases, they often require a high processing

time, which is not practical when considering large

datasets, and a potential learning step.

Servant, L., Kurtz, C. and Wendling, L.

An Extension of the Radial Line Model to Predict Spatial Relations.

DOI: 10.5220/0011644300003417

In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 4: VISAPP, pages

187-195

ISBN: 978-989-758-634-7; ISSN: 2184-4321

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

187

Early work had already shown the interest of

fuzzy neural network approaches (Wang and Keller,

1999) for learning and predicting spatial relation-

ships, also incorporating linguistic quantiﬁers (Mat-

sakis et al., 2001). The advent of deep learn-

ing architectures and the creation of large annotated

datasets such as Visual Genome or SpatialSense from

crowd-sourcing campaigns have allowed us to recon-

sider spatial relationship recognition as a classiﬁca-

tion task. Most of the recent approaches (Peyre et al.,

2019) rely on CNN architectures where a model is

trained from the coordinates of the objects’ bound-

ing boxes, and potentially their semantics, to predict

spatial relationships. For example, the SpatialSense

dataset (Yang et al., 2019), is built using an adver-

sarial crowd-sourcing method, in which humans are

asked to ﬁnd hard-to-predict spatial relationships us-

ing bounding boxes. One of the drawbacks of these

approaches is that they often only consider the bound-

ing boxes of the objects present in the image and not

the segmentation masks, whereas most state-of-the-

art computer vision approaches rely on these features.

Recently, a powerful new representation called Force

Banner, combined with CNN training, has been used

to better predict spatial relationships in complex sit-

uations (Del

earde et al., 2022). One of the current

challenges is the cleaning of erroneous annotations

in datasets. This problem is even more crucial for

annotations on spatial relationships which are often

ambiguous because they depend on human interpre-

tation. In particular, directional spatial relations such

as ”to the left of” are very sensitive to the viewpoints

(2D/3D) and the representation of the scene (e.g. front

or back person). This can lead to inconsistent and er-

roneous predictions in many situations.

We focus in this article on an original and fast ap-

proach where the spatial positioning of objects is no

longer considered in a relative way (i.e. for one ob-

ject in relation to another), but from a reference point.

After a quick presentation of the original Radial Line

Model (RLM) (Santosh et al., 2012) (Sec. 2), we show

how to wisely extend it to any pair of objects by deﬁn-

ing a reference point from the convex hulls and not the

enclosing rectangles (Sec. 3). We propose to integrate

other mathematical functions into its modelling in or-

der to provide a more complete spatial interpretation

via a viewpoint. The achieved descriptors are then

embedded in a neural learning mechanism via a mul-

tilayer perceptron to efﬁciently predict from object

pairs their directional spatial relationships (Sec. 4).

We also show that such a representation makes it

possible to automatically correct datasets whose con-

struction has been rendered ambiguous by the human

evaluation of 2D/3D views.

Figure 1: Illustration of the minimum boundary rectangle

(MBR) between object pairs (Santosh et al., 2012).

2 RADIAL LINE MODEL:

BACKGROUND

The spatial positioning of objects is no longer consid-

ered relatively (i.e. one object A in relation to another

object B), but from a reference point R

located in

the image support. This reference point is deﬁned as

the centroid of a region determined by the topologi-

cal characteristics between the two objects A, B, ob-

tained from the 9-intersection formalism (Egenhofer

and Franzosa, 1991).

As seen in (Santosh et al., 2012), minimum

boundary rectangle (MBR) can be used to determine

the position of this point (see Fig. 1). From this refer-

ence point R

, a pencil of lines along a set of discrete

directions is calculated to provide angular coverage of

both objects (see Fig. 2).

Let us consider an object X, Θ = 2π/m a constant

discretization angle and line(R

,θ

) being the half-

line deﬁned from R

and of direction θ

. The his-

togram of the angular model H is deﬁned as:

H (X, R

) = [M(R

, j.Θ)]

j=0,m−1

(1)

where

M(R

,θ

) = |line(R

,θ

) ∩ X| (2)

Considering both objects A and B, a global spatial

relation signature R(X,R

), denoted as Radial Line

Model (RLM), is obtained:

R(X,R

) = {H (A, R

),H (B,R

)} (3)

Finally, the model can be reﬁned by consider-

ing the number of points belonging to X, included

in a sector delimited by two consecutive half-lines

line(R

,θ

) and line(R

,θ

i+1

), and then normalized

by the size of the object X to better take into account

its shape (Santosh et al., 2014).

It is important to note that this spatial relation

model was ﬁrst deﬁned to solve document analy-

sis problems where similar symbols are usually de-

scribed by considering four main directions. We show

here that this model can be extended and adapted to

consider pairwise broad objects and that it can also be

integrated into a classiﬁcation task.

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

188

(a) Convex hulls and R

(b) Radial lines

Figure 2: Radial Line Model (RLM) computation illustra-

tion. From an image containing a pair of segmented objects,

the reference point R

(in yellow) is computed using convex

hulls (in red) of both objects (a). According to the chosen

value of Θ = 3

◦

, the radial lines are then created (b).

3 FROM RADIAL LINE MODEL

TO SPATIAL RELATION

PREDICTION

3.1 Reference Point

The reference point R

of the RLM should provide

meaningful information about the two objects A and

B contained in the image (see Fig. 2). The optimal

point for any given pair of objects is therefore located

between them. As seen in Sec. 2, minimum boundary

rectangle (MBR) can be used to determine the posi-

tion of the point. However, objects may have a topol-

ogy that leads to the calculation of an eccentric point,

which makes the histogram difﬁcult to analyse.

In this article, we focus on the analogical convex

hull calculation described in (Freeman, 1975) and the

barycenter of each object in order to calculate a new

reference point R

. It is important to note that an ob-

ject can be deﬁned by disconnected parts. Let us con-

sider an object X deﬁned from a set of ﬁnite points

of size s = |X|. By deﬁnition, the convex hull of X

forms a 2D convex polygon, denoted X

and by the

upper bound theorem, the number of faces of the con-

vex hull of s points is linear in O(s). In this work,

we consider the well-known Graham scan algorithm

for computing the convex hull in O(s ln s) time. The

Kirkpatrick-Seidel algorithm can also be considered

to reduce this complexity. If an object consists of a

single connected component without holes, the con-

vex hull can be constructed in linear time from the set

of contour points.

Let us consider C

the centroid (or barycenter) of

object A (resp. C

for object B). If the points coin-

cide, we assume that R

is equal to a centroid; oth-

erwise, we focus on the possible intersection points

between the line D(C

) and the two convex hulls

and B

. If A

∩ B

0 the intersection points

closest to D(C

) and each convex hull, belonging

to the segment [C

] are kept to deﬁne a new seg-

ment [I

] whose size corresponds to the minimum

distance between A

and B

from D(C

) that is

(reciprocally for I

= argmin



∥x

–C

∥|x

∈ {D(C

)∩A

}



(4)

Then, the reference point R

is deﬁned as the mid-

dle of the segment [I

]. After applying (4) if both

points I

and I

belong to [C

] (i.e., a weak over-

lapping of convex hulls) the coordinates of the refer-

ence point are calculated as before; otherwise (high

overlapping) if I

∈ [C

] (resp. I

) the reference

point is set in the middle of [I

] (resp. [C

]).

The proposed method for determining the refer-

ence point has the advantage of being fast, although a

little slower than the original MBR method, and al-

lows to limit the number of occurrences where the

reference point overlaps with objects when the con-

sidered objects are close thanks to convex hulls.

3.2 Extended Radial Line Model

Without loss of generality, the original Radial Line

Model described in Sec. 2 can be easily rewritten from

a set of m half-lines deﬁned from polar coordinates as

An Extension of the Radial Line Model to Predict Spatial Relations

189

follows:

(x,y) = (x

+ r cos θ

+ r sin θ

) (5)

with a constant angular step Θ = 2π/m calculated

from m ∈ , an even integer value representing all

the studied directions θ

i=0,m−1

∈ [0,2π[ and r = (x −

)/cos θ

= (y − y

)/sin θ

. Then an accumulator is

applied as in Eq. (2) but we normalized the reached

value by sin(θ

) (resp. cos(θ

)) following the sector

in a isotropic way.

Additionally, we propose to integrate in our global

spatial relation signature R (Eq. 3) a new spatial

relation distribution by considering attractive forces

(Matsakis and Wendling, 1999) between two objects

but limited to the set of lines of the radial model. This

amounts to evaluating a new type of spatial relations

from a reference point of view deﬁned between the

pair of objects.

As in (Matsakis and Wendling, 1999) the base of

the model relies on the deﬁnition of a force of attrac-

tion between points, extended to the handling of seg-

ments. Given a point of object A and a point of object

B located at a distance d from each other, their force

of attraction is computed as:

(d) =

(6)

where r characterizes the kind of force processed. Let

and J

be two segments on a line of angle θ

be-

longing to a pencil of lines describing the RLM de-

ﬁned previously. D

is the distance between them

and |.| the segment length. The force of attraction f

of segment I

with regards to segment J

is given by:

) =

|+D

+|J

(u − v) dv du.

(7)

Given two binary objects A and B, each θ

oriented line of the RLM forms two sets of segments

belonging to each object: C

= ∪{I

}

j=1,n

and

= ∪{I

}

j=1,n

. The mutual attraction between

these segments is deﬁned as:

(θ

) =

∑

∈C

∑

∈C

). (8)

Considering all the m oriented lines of direction θ

allows to deﬁne a new signature similar to Eq. (1):

H ((A, B),R

) = [F

(θ

)]

i=0,m−1

(9)

Then we focus on two levels of forces widely used

in the literature and efﬁcient to assess spatial relation

between a couple of objects:

(a) RLM histogram

(b) Force histogram

Figure 3: Computed descriptors for the spatial conﬁgura-

tion presented in Fig. 2. The three histograms use Θ = 3

◦

(a) Radial Line Model from each object, using the middle

point between the two objects of the image. Object A corre-

sponds to the referent, object B, to the argument. (b) Force

histogram using f

around a reference point after Max nor-

malization.

• f

r=0

relies on constant forces which are indepen-

dent of the distance between objects. In some ex-

tent this approach is based on the handling of an

isotropic histogram of angles;

• f

r=2

relies on gravitational forces where more im-

portance is given to closer points.

Fig. 3 provides some examples of RLM and force

histograms for the spatial conﬁguration from Fig. 2.

3.3 Recognition of Spatial Relations

Based on the achieved global spatial relation signa-

tures (extended RLM + forces) between pairs of ob-

jects, we propose a generic framework to translate our

descriptor into spatial relations expressed in natural

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

190

language, and / or potentially into more compact spa-

tial features which can be used in a larger recognition

tasks, for example when a scene is composed of mul-

tiple pairs of objects.

Two main options are possible to translate rela-

tive position descriptors into spatial relations in nat-

ural language: relying on machine learning to auto-

matically generate the transformations, as in (Wang

and Keller, 1999) for the histogram of angles, or using

predeﬁned evaluation rules from theoretical analysis,

as in (Matsakis et al., 2001) for the force histogram.

We propose to use machine learning for our descrip-

tor, by learning the transformation from a dataset an-

notated with object pairs and their spatial relations.

4 EXPERIMENTAL STUDY

We develop three ways to showcase the interest of our

method: the ﬁrst way aims to show the ability of our

descriptors to capture enough spatial information to

predict spatial relations between object pairs by train-

ing a model on a given dataset. The second one aims

to learn a spatial model from synthetic images and

predict spatial relations in satellite images, while the

third way deals with the denoising of a given ambigu-

ous dataset.

4.1 Datasets

Different datasets of (synthetic or natural) images

were considered in this study. Each image depicts a

scene containing a speciﬁc spatial conﬁguration be-

tween a pair of crisp objects, including correspond-

ing annotations. SimpleShapes dataset contains 2280

synthetic images, divided in two distinct sub-datasets

named SimpleShape1 (S1) and SimpleShape2 (S2).

S1 comprises masks of complex objects such as boats

and cars (see Fig. 2), while S2 is composed of convex

and concave geometric objects such as triangles, and

ellipses. Images have been synthesised in a random

way, with no background, and created by generating

random orientation, scale and place on the image. The

GIS dataset is composed of 211 images representing

spatial conﬁgurations of geographical objects (e.g.,

houses, river) sensed from aerial images.

Each image of these three datasets contains anno-

tations in which their spatial relations has been as-

sessed by three different experts using the four main

directions (North, West, South, East) (Del

earde et al.,

2022). Images were also ranked from N1 to N4 ac-

cording to the difﬁculty to determine the spatial rela-

tions between the two objects (N1 corresponds to eas-

iest, N4 to ambiguous and/or undecidable). For this

Table 1: Comparison of different methods to compute non-

overlapping reference points (R

) in the 2280 images of

the SimpleShapes dataset. Overlapped R

does not prevent

computation of RLM and forces but decreases the quality

of the obtained descriptors.

computation % of R

overlapping w/ objects

Straight MBR 71.1%

Oriented MBR 73.7%

Mean of centroids 74.4%

Convex hulls 95.8%

experimental study, N4 images were rejected from the

datasets, lowering the total images used of Simple-

Shapes to 1993, and to 190 images for GIS.

For some experiments, we also considered a sub-

set of the SpatialSense dataset (Yang et al., 2019),

composed of 11570 natural images representing ev-

eryday life scenes (S3), see e.g. Fig. 4. For each

image, SpatialSense provides different spatial annota-

tions (bounding boxes of objects and spatial conﬁgu-

rations between them), with spatial relations between

object pairs. We restrict the dataset to images pre-

senting to the left of, above, to the right of, or below

spatial relations, thus reducing the size to 2290 im-

ages. However, some spatial relations are given in a

3-dimensional space. The orientation of subjects and

objects are taken into account when given a spatial

relation, which means the spatial relation may vary a

lot depending on the point of view (2D or 3D). Ac-

cording to our experiences, we may need segmented

objects. We have then pre-processed this dataset to

obtain regions corresponding to the objects of interest

via a segmentation performed in the bounding boxes

provided in the annotations (Del

earde et al., 2021).

4.2 Directional Relation Classiﬁcation

We aim in this preliminary experiment to showcase

the ability of the proposed method to predict direc-

tional spatial relations from the images characterized

by our descriptors, and the importance of using con-

vex hulls as the basis to obtain the reference point R

4.2.1 Experimental Protocol

As mentioned in Sec. 2 different methods can be em-

ployed to determine the reference point R

, such as

straight MBR, oriented MBR and mean distance to

barycenters of both objects. In cases where R

over-

laps with an existing object, results from the RLM

may vary a lot and can create errors in classiﬁcation.

We aim to minimize the average number of images

where the reference point R

overlaps with an ob-

ject while still retaining a correct position to obtain

the necessary information to predict the spatial rela-

tion of the two objects. A comparative study of the

An Extension of the Radial Line Model to Predict Spatial Relations

191

Figure 4: A SpatialSense image containing an ambiguous

annotation. The image connects the bicycle in the fore-

ground, to the woman on the left, considers in the anno-

tations ”Bicycle to the left of woman”, but the bicycle is on

the right of the woman.

behavior of the different methods possible to com-

pute R

is provided in Tab. 1, using the SimpleShapes

dataset. This preliminary result conﬁrms our intuition

and highlights the interest of using convex hulls as the

basis to obtain the reference point. We will then only

consider this approach to compute our descriptors.

We now illustrate the ability of the proposed

method to predict directional spatial relations. For

this experiment, we use the whole S1 and S2 datasets

individually. In order to showcase the ability of our

method to predict correct spatial relations from these

object conﬁgurations, we compute for each pair the

extended RLM of each object and the forces associ-

ated with the relation (RLM + f

). For RLM and force

histograms, we considered 120 values (Θ = 3

◦

) lead-

ing to a ﬁnal feature vector of dimension 360.

Given these image representations, we trained dif-

ferent models to predict the correct spatial relation:

Support-Vector Machine (SVM), k-Nearest Neigh-

bors (k-NN), and Random Forests (RF). The three

models were implemented with default parameters.

As a comparison method, we implemented another

strategy relying only on the bounding boxes of the

object pairs (B-Box approach), which is similar to

the method “2D-only” of SpatialSense (Yang et al.,

2019). Technically, an image containing a spatial con-

ﬁguration can be characterized by considering the co-

ordinates of all bounding boxes of objects. For this

approach, the same classiﬁers have been considered.

For each tested approach, we used a 5-fold cross-

validation (to learn on 80% of the data), and computed

accuracy and standard deviation of cross-validation

on test sets (20% of the data).

4.2.2 Results

The obtained results are presented in Tab. 2. Over-

all, the models trained from our descriptors or the

comparative B-Box approach offer very good perfor-

mance (with an accuracy greater than 0.9 and a low

standard deviation). However, using the extended

Table 2: Classiﬁcation of directional relations. Compari-

son of accuracy and standard deviation of cross-validation

on test sets using different supervised models fed with ra-

dial line models and forces (RLM + f

) and models with

bounding box coordinates (B-Box).

SVM k-NN RF

S1 (B-Box) 0.95±0.02 0.90±0.02 0.91±0.02

S2 (B-Box) 0.96±0.01 0.93±0.02 0.94±0.01

S1 (RLM) 0.93±0.03 0.91±0.03 0.92±0.03

S2 (RLM) 0.95±0.02 0.94±0.03 0.94±0.02

S1 (RLM + f

) 0.93±0.02 0.93±0.03 0.95±0.01

S2 (RLM + f

) 0.97±0.01 0.96±0.01 0.97±0.01

RLM with forces allowed to average +0.02 in accu-

racy with almost no changes in the standard deviation

of the cross-validation method. The only model per-

forming worse compared to the bounding box coordi-

nates is the SVM on the S1 dataset.

This ﬁrst study shows the discriminability capac-

ity of directional spatial conﬁgurations between ob-

jects offered by our descriptors as well as its compa-

rability to a (naive but) state-of-the-art approach.

4.3 Synthetic to Natural Conﬁgurations

In a second experiment, we present a practical use

case of our spatial relationship prediction method.

Despite the lack of annotated data (with this type of

spatial information) in the literature, we want here to

show that it is possible to learn powerful spatial mod-

els from synthetic data and to transfer these models to

deal with natural images containing realistic spatial

conﬁgurations.

4.3.1 Experimental Protocol

This experiment uses the same (synthetic) datasets as

the former experiment with the addition of the GIS

dataset composed of natural images. This experiment

aims to show the ability of the extended Radial Line

Model to predict spatial relations of natural images

from a learning on another dataset.

As discussed in Sec. 3.2, for each object pair,

we compute the spatial relation signature R(X,R

composed of the both radial line models of each ob-

ject, and the force histogram between the referent and

the argument. We use a value of Θ=3°, which means

each histogram comprises 120 values, leading to a ﬁ-

nal feature vector of dimension 360. The tested ap-

proach uses a Multi-Layer Perceptron (MLP) of 4 hid-

den layers of 448 units each with a ReLU activation

function, and Adam Optimizer (with initial learning

rate of 10

−3

, β

= 0.9, β

= 0.999, and ε = 10

−8

For comparative purposes, different other models

have been implemented and evaluated:

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

192

Table 3: Transfer: From synthetic conﬁgurations to natural

ones. Comparison of different models trained on SS syn-

thetic datasets and then used to predict spatial relations from

the GIS dataset.

Model B-Box RLM

init

RLM RLM + f

RLM + f

+ f

Accuracy 75% 83% 72% 79% 87% 90% 93%

• as baseline, the B-Box approach and the initial

RLM model (Santosh et al., 2012);

• a model using a 240-dimensional vector built

from the spatial relation signature R(X,R

)

(without force information);

• models using only a 120-dimensional vector, the

force histogram ( f

, f

This choice aims to demonstrate the relevance of

combining different information to obtain a model

more robust than each descriptor considered individ-

ually. The parameters of the MLP remain the same.

We train each model on the two synthetic datasets

(S1+S2) and employ a cross-validation strategy, with

ﬁve subsets (of 20% of the samples each) as test sets.

4.3.2 Results

Each model trained is then applied to predict spatial

relations on the 190 GIS segmented images. Most of

the image contain easily found relationships (118 im-

ages with N1 level) and contain mostly horizontal re-

lationships, with 86 to the left of and 49 to the right

of (see (Del

earde et al., 2022)). Tab. 3 presents the

percentage of correctly found relations for each tested

model. Using only forces, ﬁrst with f

and then with

, the accuracy of the models reach between 72% and

79%, while using only RLM of each object reaches

an accuracy of 87%. f

providing better overall re-

sults, we used this type of forces to combine with the

RLM. The accuracy of the prediction shows slightly

better although considerable improvements. By com-

bining the Radial Line Model of both objects, with

both types of forces used before, we obtain a more

robust and better model with better accuracy.

4.4 Dataset Denoising

In this last experiment, we show how our method can

be practically implemented to correct datasets con-

taining (human) spatial annotations that may be er-

roneous or ambiguous.

4.4.1 Experimental Protocol

The last experiment aims to provide a correction of

the SpatialSense (S3) dataset. As mentioned ear-

lier, this dataset contains annotations about the spa-

tial relations between objects present in the images.

Figure 5: Prototypes of each spatial relation class in the

SimpleShapes dataset (to the right of in orange, above in

green, to the left of in blue, under in red).

Figure 6: The prototypes of the to the left of and to the

right of each mix two curves, the bigger one corresponds

to the 2D viewpoint, while the smaller one regroups all of

the wrongly annotated spatial relations (3D viewpoint). The

prototypes of the above and under although not as affected,

still display forms of noise created by 3D viewpoints.

However, as it has been annotated by humans, via

crowd-sourcing campaigns, some annotations can be

ambiguous because they depend on human interpre-

tation. In particular, directional spatial relations are

very sensitive to the viewpoints (2D/3D) and the rep-

resentation of the scene (e.g. Fig. 4). This can lead to

inconsistent learning and erroneous predictions.

To further illustrate this problem of ambiguity, we

propose in a preliminary experiment to build and visu-

alize, for each class of spatial relationship of a dataset,

the prototypes obtained by averaging, for all the im-

ages belonging to a class, the proposed spatial relation

descriptors. It is interesting to compare the prototypes

obtained from the clean SimpleShapes dataset (Fig. 5)

to the ones obtained on the noisy SpatialSense dataset

(Fig. 6) to further motivate the need of denoising it.

The goal of this experiment is then to provide au-

tomatic corrections of these spatial relations. As pre-

viously, we focus here solely on the four main direc-

tional spatial relations. Out of all available images in

the test set of S3, 100 labels have been manually cor-

rected to match spatial relations in a 2-dimensional

space. Most of the corrections are done on the left-

right axis, as most of the ambiguity in the dataset are

based on the orientation of objects in the space. Few

objects in the images had an orientation which lead

to a different interpretation depending on the number

of dimension chosen. This manual correction leads to

a cleaned test set of S3 that we will consider later as

a ground truth to evaluate the impact of our dataset

denoising strategy.

An Extension of the Radial Line Model to Predict Spatial Relations

193

Table 4: Cleaning process of the training set of SpatialSense

(S3). For each class of the initial (noisy) dataset, the table

illustrates the distribution of the new spatial labels assigned

to the conﬁgurations.

Cleaned labels

Initial label # images Under Right Above Left

Under 597 197 194 104 102

Right 498 52 236 58 152

Above 429 68 60 191 110

Left 666 70 241 84 271

Table 5: Classiﬁcation results of each model on the 100

hand-made corrections of S3. The models were trained ei-

ther on the noisy version of S3, or on its clean version.

Training Model Accuracy

Noisy S3 B-Box 0.36

RLM + forces (bboxes) 0.46

RLM + forces (segment.) 0.29

Cleaned S3 B-Box 0.83

RLM + forces (bboxes) 0.93

RLM + forces (segment.) 0.95

To denoise a dataset like SpatialSense, we propose

to learn a clean model for predicting spatial relation-

ships from a synthetic dataset like SimpleShapes. We

propose to use as models the prototypes stated above

(Fig. 5) computed by aggregating the spatial signa-

tures for each class on S1+S2. From this model, the

next step is to apply it to the S3 training set to correct

one by one the annotations of the images that may

contain errors. This can be done by a calculation of

afﬁnity scores between the representation of the im-

age tested and the set of prototypes of the model. De-

pending on the obtained result, the spatial label of

the image is corrected or not. This process leads to

a cleaned S3 dataset, which can then be considered

to train a model of spatial relationship predictions,

which can be tested on the ground truth set.

To showcase the added value of this process, we

can compare the performance on the ground truth of

a model trained on the cleaned version of the dataset

versus a model trained on the initial noisy version of

the dataset. Here different representations were com-

pared, with the objective to highlight the interest of

the force information and the radial line models: (1) A

strategy relying only on the coordinates of the bound-

ing boxes of the object pairs (B-Box approach). A

scene is then described by 8 integers: 4 for each object

min

max

min

, and y

max

); (2) A strategy that consid-

ers the bounding box of each object (initially provided

in the annotations of S3) as crisp regions on which to

apply combination of the forces and the RLM (same

parameters as in Sec. 4.2). Note that simplifying the

object as a bounding box leads in some cases to boxes

overlapping each other and creating wrong relations.

In some other cases, the bounding boxes created for

objects did not ﬁt correctly the objects whose topol-

ogy had a center of gravity far away from the center

of the corresponding bounding box; (3) A strategy to

correct the problems listed above which was imple-

mented using relatively precise object segmentations

of the S3 images thanks to (Del

earde et al., 2021).

It allows for more precise topology of objects to be

used. The forces and the RLM are then computed

from these regions. As in Sec. 4.3, we use a MLP

to learn a model from these representations to predict

the relations.

4.4.2 Results

As preliminary results, Tab. 4 presents the outputs

of the cleaning process applied to the training set of

S3 thanks to the prototypes learnt on S1+S2. As ex-

pected, most of the relationships that were corrected

were horizontal (left, right) and were ambiguous due

to the point of view considered.

To evaluate the impact of the denoising step and

the interest of learning on a clean dataset, Tab. 5 pro-

vides the classiﬁcation results (accuracy) of each as-

sessed model on the 100 hand-made corrections of S3

which is considered as ground-truth. As observation,

the combination of RLM and force information car-

ries more useful spatial information to discriminate

the different spatial conﬁgurations than the B-Box

model. Furthermore, on a natural image dataset such

as SpatialSense, we also note the interest of consider-

ing regions that realistically approximate the objects

of interest whose spatial conﬁgurations are studied.

5 CONCLUSION

To model spatial relations between object pairs, the

proposed approach combines the Radial Line Model

and the forces histogram computed from a reference

point, computed from the convex hulls of objects.

Used as an image representation, this model outper-

forms state-of-the-art models in classifying spatial

conﬁgurations of objects, albeit in a negligible slower

time. The extended RLM can transfer spatial relations

from one dataset to another one, and also denoise

datasets subjected to the human error of viewpoint

on images. This novel approach could further be im-

proved by employing a convolutional neural network

architecture to learn the representation in an end-to-

end fashion. Another improvement could be made by

using a morphological operator such as the geodesic

distance to extract the reference point, to better take

into account concave objects.

VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications

194

REFERENCES

Bloch, I. (1999). Fuzzy Relative Position between Objects

in Image Processing: A Morphological Approach.

IEEE TPAMI, 21(7):657–664.

ement, M., Poulenard, A., Kurtz, C., and Wendling, L.

(2017). Directional Enlacement Histograms for the

Description of Complex Spatial Conﬁgurations be-

tween Objects. IEEE TPAMI, 39(12):2366–2380.

Cohn, A., Bennett, B., Gooday, J., and Gotts, N. (1997).

Qualitative spatial representation and reasoning with

the region connection calculus. GeoInformatica,

3(1):275–316.

Del

earde, R., Kurtz, C., Dejean, P., and Wendling, L.

(2021). Segment my object: A pipeline to extract seg-

mented objects in images based on labels or bounding

boxes. In VISAPP, pages 618–625.

Del

earde, R., Kurtz, C., and Wendling, L. (2022). Descrip-

tion and recognition of complex spatial conﬁgurations

of object pairs with force banner 2d features. PR,

123:108410.

Egenhofer, M. and Franzosa, R. (1991). Point-set topologi-

cal spatial relations. Int. J. GIS, 2(5):161–174.

Freeman, J. (1975). The Modelling of Spatial Relations.

CGIP, 4(2):156–171.

Matsakis, P., Keller, J. M., Wendling, L., Marjamaa, J., and

Sjahputera, O. (2001). Linguistic description of rela-

tive positions in images. IEEE TSMC, 31(4):573–88.

Matsakis, P. and Naeem, M. (2016). Fuzzy Models of Topo-

logical Relationships Based on the PHI-Descriptor. In

FUZZ-IEEE, pages 1096–1104.

Matsakis, P. and Wendling, L. (1999). A new way to repre-

sent the relative position between areal objects. IEEE

TPAMI, 21(7):634–643.

Peyre, J., Laptev, I., Schmid, C., and Sivic, J. (2019).

Weakly-Supervised Learning of Visual Relations. In

ICCV, pages 5189–5198.

Rosenfeld, A. and Klette, R. (1985). Degree of adjacency

or surroundedness. PR, 18(2):169–177.

Santosh, K. C., Lamiroy, B., and Wendling, L. (2012). Sym-

bol recognition using spatial relations. Pattern Recog-

nit. Lett., 33(3):331–341.

Santosh, K. C., Wendling, L., and Lamiroy, B. (2014). Bor:

Bag-of-relations for symbol retrieval. Int. J. Pattern

Recognit. Artif. Intell., 28(6).

Vanegas, M. C., Bloch, I., and Inglada, J. (2011). A fuzzy

deﬁnition of the spatial relation “surround” - Applica-

tion to complex shapes. In EUSFLAT, pages 844–851.

Wang, X. and Keller, J. M. (1999). Human-based spatial

relationship generalization through neural/fuzzy ap-

proaches. Fuzzy Sets Syst., 101(1):5–20.

Yang, K., Russakovsky, O., and Deng, J. (2019). Spa-

tialsense: An adversarially crowdsourced benchmark

for spatial relation recognition. In ICCV, pages 2051–

2060.

An Extension of the Radial Line Model to Predict Spatial Relations

195