language, and / or potentially into more compact spa-
tial features which can be used in a larger recognition
tasks, for example when a scene is composed of mul-
tiple pairs of objects.
Two main options are possible to translate rela-
tive position descriptors into spatial relations in nat-
ural language: relying on machine learning to auto-
matically generate the transformations, as in (Wang
and Keller, 1999) for the histogram of angles, or using
predefined evaluation rules from theoretical analysis,
as in (Matsakis et al., 2001) for the force histogram.
We propose to use machine learning for our descrip-
tor, by learning the transformation from a dataset an-
notated with object pairs and their spatial relations.
4 EXPERIMENTAL STUDY
We develop three ways to showcase the interest of our
method: the first way aims to show the ability of our
descriptors to capture enough spatial information to
predict spatial relations between object pairs by train-
ing a model on a given dataset. The second one aims
to learn a spatial model from synthetic images and
predict spatial relations in satellite images, while the
third way deals with the denoising of a given ambigu-
ous dataset.
4.1 Datasets
Different datasets of (synthetic or natural) images
were considered in this study. Each image depicts a
scene containing a specific spatial configuration be-
tween a pair of crisp objects, including correspond-
ing annotations. SimpleShapes dataset contains 2280
synthetic images, divided in two distinct sub-datasets
named SimpleShape1 (S1) and SimpleShape2 (S2).
S1 comprises masks of complex objects such as boats
and cars (see Fig. 2), while S2 is composed of convex
and concave geometric objects such as triangles, and
ellipses. Images have been synthesised in a random
way, with no background, and created by generating
random orientation, scale and place on the image. The
GIS dataset is composed of 211 images representing
spatial configurations of geographical objects (e.g.,
houses, river) sensed from aerial images.
Each image of these three datasets contains anno-
tations in which their spatial relations has been as-
sessed by three different experts using the four main
directions (North, West, South, East) (Del
´
earde et al.,
2022). Images were also ranked from N1 to N4 ac-
cording to the difficulty to determine the spatial rela-
tions between the two objects (N1 corresponds to eas-
iest, N4 to ambiguous and/or undecidable). For this
Table 1: Comparison of different methods to compute non-
overlapping reference points (R
p
) in the 2280 images of
the SimpleShapes dataset. Overlapped R
p
does not prevent
computation of RLM and forces but decreases the quality
of the obtained descriptors.
R
p
computation % of R
p
overlapping w/ objects
Straight MBR 71.1%
Oriented MBR 73.7%
Mean of centroids 74.4%
Convex hulls 95.8%
experimental study, N4 images were rejected from the
datasets, lowering the total images used of Simple-
Shapes to 1993, and to 190 images for GIS.
For some experiments, we also considered a sub-
set of the SpatialSense dataset (Yang et al., 2019),
composed of 11570 natural images representing ev-
eryday life scenes (S3), see e.g. Fig. 4. For each
image, SpatialSense provides different spatial annota-
tions (bounding boxes of objects and spatial configu-
rations between them), with spatial relations between
object pairs. We restrict the dataset to images pre-
senting to the left of, above, to the right of, or below
spatial relations, thus reducing the size to 2290 im-
ages. However, some spatial relations are given in a
3-dimensional space. The orientation of subjects and
objects are taken into account when given a spatial
relation, which means the spatial relation may vary a
lot depending on the point of view (2D or 3D). Ac-
cording to our experiences, we may need segmented
objects. We have then pre-processed this dataset to
obtain regions corresponding to the objects of interest
via a segmentation performed in the bounding boxes
provided in the annotations (Del
´
earde et al., 2021).
4.2 Directional Relation Classification
We aim in this preliminary experiment to showcase
the ability of the proposed method to predict direc-
tional spatial relations from the images characterized
by our descriptors, and the importance of using con-
vex hulls as the basis to obtain the reference point R
p
.
4.2.1 Experimental Protocol
As mentioned in Sec. 2 different methods can be em-
ployed to determine the reference point R
p
, such as
straight MBR, oriented MBR and mean distance to
barycenters of both objects. In cases where R
p
over-
laps with an existing object, results from the RLM
may vary a lot and can create errors in classification.
We aim to minimize the average number of images
where the reference point R
p
overlaps with an ob-
ject while still retaining a correct position to obtain
the necessary information to predict the spatial rela-
tion of the two objects. A comparative study of the
An Extension of the Radial Line Model to Predict Spatial Relations
191