on each other, but have the same input as received
from the region proposal.
We minimize the overall loss function in order to
detect the facade elements. Thanks to our subdivided
structure, we can do so by minimizing each loss part
separately. The classification loss minimization can
help us to classify our desired category which is the
facade element. RPN loss is crucial to minimize to
get a more optimal region proposal indicating the po-
tential facade element area. This region proposal is
not based on color value gradients, which may lead to
many more non-relevant features, instead, it focuses
on capturing actual window complex. By minimizing
the bounding box and mask loss, we can detect the lo-
cation and shape of a facade element. We also use the
mask to check and verify the results of the bounding
box in terms of double detection. In some cases, one
facade element is detected and split into two bounding
boxes that are either touching each other or overlaying
one another.
In contrast to common object detection networks,
we do not use the average precision score as a crite-
ria for improving the network, but the total number of
correctly detected windows: we especially focus on
improving the recall value for the bounding boxes. In
other words, we prefer finding the exact amount of fa-
cade elements without detecting too many or too few
facade elements to optimising the mask for pixel-wise
correctness.We also adapt the detection threshold op-
timally such that most of the facade elements are de-
tected and the error of false detection is minimized.
2.3 2D Facade Element Alignment
There are two forms of output for our 2D facade el-
ement detection step: 2D mask and bounding box of
the facade elements. We use the mask results to check
and verify the bounding box results. The detected 2D
mask is also intended for generating detailed facade
element contours, however, currently the detected 2D
mask is too noisy due to the low quality of the ac-
quired raw textures. Therefore, we choose to use the
detected 2D bounding boxes to represent the facade
element contours. In the future, we plan to improve
our detection method to obtain detailed facade ele-
ment contours.
As shown on the left side of Figure 4, most of the
times, the detected facade element bounding boxes on
one wall image are not of the same size, and they are
also not well aligned. However, in reality, the win-
dows on one wall normally (1) have the same size and
(2) are well aligned in horizontal and vertical direc-
tions. To prepare a better input for the next step of
adding 3D facade elements as well as for generating
a natural-looking wall, we first delete the overlapping
bounding boxes and then regularize the detection re-
sults based on these two heuristics.
Figure 4: The sub-figures on the left and right show an ex-
ample of the original facade element detection result and
the result after adjustment, respectively. In this example,
N
cluster
= 3.
(1) To ensure that the detected facade elements
have the same size of bounding boxes, we first com-
pute the average width and height, avg
w
and avg
h
, for
all the detected facade element bounding boxes. Then
for each single facade element bounding box, we sub-
tract avg
w
and avg
h
from its width and height. Here
we set a threshold for width and height difference,
dif
w
and dif
h
(dif
w
, dif
h
> 0), based on the dimensions
of the wall image. If the absolute difference between
a facade element bounding box’s width (height) and
avg
w
(avg
h
) exceeds the specified threshold, we con-
sider this element as an outlier and exclude it from
the computation of avg
w
and avg
h
in the next itera-
tion. We repeat this procedure until no new outlier is
found, and then the finally obtained avg
w
and avg
h
are
considered as the target regularized width and height
for the facade elements. For the example in Figure 4,
we set the threshold to 4% of the width and height of
the wall image, and it took two iterations for the loop
to converge. Generally, the loop converges within ten
iterations when the threshold is set to be between 2%
and 5%, since most of the detected bounding boxes
are allocated regularly.
(2) To fulfill the second heuristic, we first compute
the center for each facade element bounding box, c
ix
and c
iy
(i = 1,2,. . . , N) in horizontal and vertical di-
rections. Based on these values, we adjust the facade
elements in horizontal and vertical directions. Here
we take the x direction adjustment as an example to
illustrate our method. We first sort the facade ele-
ments based on their c
ix
values in ascending order.
After that, we apply a rule-based clustering algorithm
to segment them. Then, we adjust the center positions
of all the facade elements in one cluster to the same x
value (c
x
).
The rule-based clustering is as follows: for the
A Data-driven Approach for Adding Facade Details to Textured LoD2 CityGML Models
297