’1’ are difficult to differentiate from the noise in the
maps. As a consequence, the current classifier tends
to exclude these characters which are therefore often
missing in the extracted text. We plan to fix this is-
sue by adding more examples for these cases to our
training data set. Besides that, to complete the imple-
mentation of the pipeline, we are still working on fine
tuning an OCR model to recognize the extracted text.
To do this, we need to create a second data set contai-
ning the extracted words and sentences and manually
label them with the correct text content.
Figure 12: Image re-
trieved with annotation
on top of target buil-
ding (Block No.243)
boundary (NewOneMap,
2018).
Figure 13: Extraction of
distorted target building
shape.
For the shape extraction part, one of the current
major problems is that some annotations may lay on
top of the building boundary. Then after shape ex-
traction, the retrieved building shape will be distorted.
In Figure 12, the block number is overlapped with the
building boundary of Block No.243. Figure 13 shows
the failure case of extracting its shape. We plan to add
post-processing to refine the extracted shape for such
a situation. Also, even though the maximum zoom
level of the new OneMap Static Map API is used for
generating the image to be processed, its resolution is
still a bit too low, so some artifacts may be introdu-
ced along the building boundary. We intend to add
image super resolution or vectorization parts to over-
come this problem.
ACKNOWLEDGEMENTS
This research is supported by the National Research
Foundation, Prime Ministers Office, Singapore under
the Virtual Singapore Programme.
Furthermore, we gratefully thank the Housing &
Development Board (HDB) for providing us the ar-
chived neighborhood maps.
We also want to express our appreciation to new
OneMap for providing us their APIs to retrieve their
map images.
REFERENCES
Ahmed, S., Liwicki, M., Weber, M., and Dengel, A. (2011).
Improved automatic analysis of architectural floor
plans. In 2011 International Conference on Document
Analysis and Recognition, pages 864–869.
Ahmed, S., Liwicki, M., Weber, M., and Dengel, A. (2012).
Automatic room detection and room labeling from ar-
chitectural floor plans. In 2012 10th IAPR Internati-
onal Workshop on Document Analysis Systems, pages
339–343.
Bartz, C., Yang, H., and Meinel, C. (2017a). See: To-
wards semi-supervised end-to-end scene text recogni-
tion. arXiv preprint arXiv:1712.05404.
Bartz, C., Yang, H., and Meinel, C. (2017b). STN-OCR:
A single neural network for text detection and text re-
cognition. CoRR, abs/1707.08831.
Biljecki, F., Ledoux, H., and Stoter, J. (2016). An improved
LOD specification for 3D building models. Compu-
ters, Environment and Urban Systems, 59:25–37.
Biljecki, F., Ledoux, H., and Stoter, J. (2017). Generating
3D city models without elevation data. Computers,
Environment and Urban Systems, 64:1–18.
Bradski, G. (2000). The OpenCV Library. Dr. Dobb’s Jour-
nal of Software Tools.
Breuel, T. M. (2008). The OCRopus open source OCR sy-
stem.
Epshtein, B., Ofek, E., and Wexler, Y. (2010). Detecting
text in natural scenes with stroke width transform.
In Computer Vision and Pattern Recognition (CVPR),
2010 IEEE Conference on, pages 2963–2970. IEEE.
Fritsch, D. and Klein, M. (2018). 3D preservation of buil-
dings – reconstructing the past. Multimedia Tools and
Applications, 77(7):9153–9170.
Gr
¨
oger, G., Kolbe, T., Nagel, C., and H
¨
afele, K. (2012).
OGC city geography markup language (CityGML)
encoding standard, version 2.0, ogc doc no. 12-019.
Open Geospatial Consortium.
Jaderberg, M., Simonyan, K., Vedaldi, A., and Zisserman,
A. (2016). Reading text in the wild with convolutional
neural networks. International Journal of Computer
Vision, 116(1):1–20.
Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., and Yan, J.
(2018). FOTS: Fast oriented text spotting with a uni-
fied network. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pages
5676–5685.
Matas, J., Chum, O., Urban, M., and Pajdla, T. (2002). Ro-
bust wide baseline stereo from maximally stable ex-
tremal regions. In Proceedings of the British Machine
Vision Conference, pages 36.1–36.10. BMVA Press.
doi:10.5244/C.16.36.
Neumann, L. and Matas, J. (2010). A method for text lo-
calization and recognition in real-world images. In
Asian Conference on Computer Vision, pages 770–
783. Springer.
Neumann, L. and Matas, J. (2012). Real-time scene text
localization and recognition. In Computer Vision and
Pattern Recognition (CVPR), 2012 IEEE Conference
on, pages 3538–3545. IEEE.
Generation of 3D Building Models from City Area Maps
575