bounding box. This algorithm places the elements in
a HTML file, using CSS properties that allow placing
the elements given their coordinates.
The metrics developed for both approaches apply
the same weights: the number of elements generated
correctly is weighted 80% and the remaining 20% are
applied to the dimensions and positioning of the el-
ements. The hybrid approach achieved as best ac-
curacy 71.30%, while the YOLO approach achieved
88.28% of accuracy and 88.4% of precision. The sec-
ond approach generates HTML code that contains ob-
jects with the correct size and position, which natu-
rally results in Web pages much more similar to the
provided mockups. The YOLO approach covered a
wide variety of HTML elements and reached an accu-
racy that outperforms the related approaches.
As future work we propose to implement a layout
algorithm with division by line and column, as oc-
curs in the Bootstrap framework. Since the YOLO
approach provides the coordinates of the bounding re-
gions, the algorithm to be developed must be able to
find the correct margins, in order to position the ele-
ments closer to the coordinate mapping that is being
used, thus making the generated code responsive. To
get around the biggest problem found in this work, the
lack of data, it is proposed to increase the size and di-
versity of the dataset. This measure aims to improve
the object’s detection accuracy, but fundamentally to
improve the accuracy of the coordinates of the bound-
ing box. It is also planned to increase the variety of
supported HTML elements. It is also intended to cre-
ate metrics for the assessment of precision and recall.
ACKNOWLEDGMENT
This work has been supported by FCT - Fundac¸
˜
ao
para a Ci
ˆ
encia e Tecnologia within the R&D Units
Project Scope: UIDB/00319/2020.
REFERENCES
Balog, M., Gaunt, A., Brockschmidt, M., Nowozin, S., and
Tarlow, D. (2017). Deepcoder: Learning to write pro-
grams. ICLR 2017.
Beltramelli, T. (2017). pix2code: Generating code from a
graphical user interface screenshot.
Bochkovskiy, A., Wang, C.-Y., and Liao, H.-Y. (2020).
Yolov4: Optimal speed and accuracy of object detec-
tion.
Capece, N., Erra, U., and Ciliberto, A. (2016). Implemen-
tation of a coin recognition system for mobile devices
with deep learning. Conf. Signal-Image Technology &
Internet-Based Systems.
Deng, Y., Kanervisto, A., Ling, J., and Rush, A. M. (2017).
Image-to-markup generation with coarse-to-fine at-
tention. Int. Conf. on ML.
Fletcher, S. and Islam, M. (2018). Comparing sets of pat-
terns with the jaccard index. Australasian Journal of
Information Systems, 22.
Hossain, M. Z., Sohel, F., Shiratuddin, M. F., and Laga, H.
(2018). A comprehensive survey of deep learning for
image captioning.
Kosar, T., Bohra, S., and Mernik, M. (2015). Domain-
specific languages: A systematic mapping study. In-
formation and Software Technology, 71.
LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learn-
ing. Technical report.
Mou, L., Men, R., Li, G., Zhang, L., and Jin, Z. (2015). On
end-to-end program generation from user intention by
deep neural networks.
Mullachery, V. and Motwani, V. (2016). Image captioning.
arXiv, abs/1805.09137.
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002).
Bleu: a method for automatic evaluation of machine
translation. pages 311–318.
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A.
(2016). You only look once: Unified, real-time ob-
ject detection. Conf. on Computer Vision and Pattern
Recognition (CVPR), pages 779–788.
Redmon, J. and Farhadi, A. (2016). Yolo9000: Better,
faster, stronger. Conf. on Computer Vision and Pat-
tern Recognition (CVPR).
Redmon, J. and Farhadi, A. (2018). Yolov3: An incremental
improvement. ArXiv, abs/1804.02767.
Srinivasan, L., Sreekanthan, D., and A.L, A. (2018). Image
captioning - a deep learning approach. Int. Journal of
Applied Engineering Research, 13.
Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A. (2016).
Inception-v4, inception-resnet and the impact of resid-
ual connections on learning. AAAI Conference on Ar-
tificial Intelligence.
Tan, K. and Wang, D. (2018). A convolutional recurrent
neural network for real-time speech enhancement.
Vinyals, O., Toshev, A., Bengio, S., and Erhan, D. (2015).
Show and tell: A neural image caption generator. In
IEEE Conf. on Computer Vision and Pattern Recogni-
tion, pages 3156–3164.
Vinyals, O., Toshev, A., Bengio, S., and Erhan, D.
(2017). Show and tell: Lessons learned from the 2015
MSCOCO image captioning challenge. IEEE Trans.
Pattern Anal. Mach. Intell., 39(4):652–663.
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhut-
dinov, R., Zemel, R., and Bengio, Y. (2015). Show,
attend and tell: Neural image caption generation with
visual attention.
WEBIST 2020 - 16th International Conference on Web Information Systems and Technologies
224