ments that could point to the presence of walls, doors,
or other such things. (Mac
´
e et al., 2010) performs
line detection using image vectorization and a Hough
transform. If a particular graphical arrangement is
met, the lines are merged to form walls. The authors
suggest a similar technique to extract arcs and find
door hypotheses. (de las Heras et al., 2011) proposes
an alternative approach that does not require image
vectorization and employs patch-based segmentation
with visual words. The method in (Ahmed et al.,
2011) is built to distinguish between thick, medium,
and thin lines in order to identify walls and eliminate
any components that are outside the convex hull of
the outer walls. In (Daniel Westberg, 2019), a thick
and a thin representation of the walls are extracted us-
ing noise removal techniques, erosions, and dilations.
Then, a hollow representation of walls with a constant
thickness is created by subtracting the thin represen-
tation from the thick one, which may then be used as
a reference for 3D modeling.
2.1.2 Deep Learning-Based Methods
To learn to predict room-boundary elements (e.g.
walls, windows, and doors) (Zeng et al., 2019) de-
scribes a multi-task neural network. In order to extract
features from the floor plan image, a shared encoder
is used, and one decoder is used for each task. Both
the encoder and the decoders’ architectures are based
on VGG (Simonyan and Zisserman, 2015). A Faster-
RCNN-based (Ren et al., 2015) object detector is en-
hanced in (Ziran and Marinai, 2018) to learn to antic-
ipate annotations in diverse floor plan datasets. The
same strategy is suggested in(Singh, 2019), however
this time it uses of a modified version of the YOLO
object detector (Redmon and Farhadi, 2017). Train-
ing or evaluation of these neural networks is done on
the datasets (Mathieu Delalandre, 2019) and (Chiran-
joy Chattopadhyay , 2019), which we also use in this
study.
2.2 3D Environment Generation
The game industry has been a major driver of ad-
vancement in the field of 3D environment generation
during the past few years. 3D engines like Unity
(Haas, 2014) and Unreal Engine (Epic Games, 2019a)
are becoming easier to use, more adaptable, and more
potent. Environments can be created using 3D props
that have been manually or programmatically created
(and, if necessary, animated). The market has a huge
selection of 3D models, which facilitates the creation
of new surroundings. Blender (Foundation, 2002) is
also a good choice. It is open-source and has an active
community and plugins to execute tasks programmat-
ically from Python scripts. Since Blender does not
have as many interaction features or VR capabilities
like game engines, we opt to develop our algorithm
with game engines. More particularly we choose Un-
real Engine 4 because of its Virtual Reality integration
plugin that works perfectly with Steam VR (Valve,
2003), ensuring the versatility of our model on var-
ious VR platforms such as Oculus (Meta, 2012) or
Varjo (Varjo, 2016). Furthermore, the access to the
Unreal Market (Epic Games, 2019b) is an important
feature that allows us to ease the process of our im-
plementation by getting plugins and assets from the
community.
2.3 Floor Plan to VR Environment
Frameworks
There exists many frameworks that achieve the tasks
of floor plan parsing and 3D environment genera-
tion. 3DPlanNet uses heuristic rules to retrieves walls
and Tensorflow Object detection API for windows
and doors (Park and Kim, 2021). This study (Dodge
et al., 2017) uses a fully connected neural network
and uses OCR (Optical Character Recognition) to es-
timate the size of the rooms. The framework proposed
in (Fr
´
eville et al., 2021) combines traditional com-
puter vision and deep-learning techniques to detect
room boundary features (walls, doors and windows,
interior objects). They implement ad-hoc map gener-
ation scripts for Unreal Engine to turn floor plans into
VR-ready environments. In our case, we use a neural
network-based solution for object detection using a
Yolo instance (Redmon and Farhadi, ). To ensure suf-
ficient training data, we design a floor plan generator
with random image perturbations. Doing so allows us
to feed YoloV5 with virtually unlimited training data,
allowing our implementation to be more robust.
3 METHOD
In this section, we present our floor plan parser and
our custom dataset generator. The architecture of the
parser system, depicted in Figure 3 has two main two
parts :
1. The furniture recognition : which uses the
YOLOv5 Framework (Deep Learning methods)
2. The wall recognition that uses OpenCV libraries
(Traditional image processing tool)
Object Detection in Floor Plans for Automated VR Environment Generation
481