
Research Agency (PID2023-146257OB-I00), Prin-
cipado de Asturias (SV-PA-21-AYUD/2021/50994),
the Council of Gij
´
on, and Fundaci
´
on Universidad de
Oviedo (FUO-23-008, FUO-22-450).
REFERENCES
Bielik, P., Fischer, M., and Vechev, M. (2018). Robust re-
lational layout synthesis from examples for android.
2(OOPSLA).
Bisong, E. and Bisong, E. (2019). Google colaboratory.
Building machine learning and deep learning models
on google cloud platform: a comprehensive guide for
beginners, pages 59–64.
Bunian, S., Li, K., Jemmali, C., Harteveld, C., Fu, Y., and
El-Nasr, M. S. (2021). Vins: Visual search for mobile
user interface design.
Chen, C., Su, T., Meng, G., Xing, Z., and Liu, Y. (2018).
From ui design image to gui skeleton: A neural ma-
chine translator to bootstrap mobile gui implementa-
tion. In 2018 IEEE/ACM 40th International Confer-
ence on Software Engineering (ICSE), pages 665–676.
Deka, B., Huang, Z., Franzen, C., Hibschman, J., Afergan,
D., Li, Y., Nichols, J., and Kumar, R. (2017). Rico:
A mobile app dataset for building data-driven design
applications. UIST ’17, page 845–854, New York,
NY, USA. Association for Computing Machinery.
Dicu, M., Gonz
´
alez, E. G., Chira, C., and Villar, J. R.
(2024a). The impact of data annotations on the per-
formance of object detection models in icon detection
for gui images. In International Conference on Hy-
brid Artificial Intelligence Systems, pages 251–262.
Springer.
Dicu, M., Sterca, A., Chira, C., and Orghidan, R. (2024b).
Uicvd: A computer vision ui dataset for training rpa
agents. In ENASE, pages 414–421.
Jocher, G., Chaurasia, A., and Qiu, J. (2023). YOLO by
Ultralytics. https://github.com/ultralytics/ultralytics.
Accessed: June 20, 2024.
Leiva, L. A., Hota, A., and Oulasvirta, A. (2020). Enrico: A
high-quality dataset for topic modeling of mobile UI
designs. In Proc. MobileHCI Adjunct.
Mi
˜
n
´
on, R., Moreno, L., and Abascal, J. (2013). A graph-
ical tool to create user interface models for ubiqui-
tous interaction satisfying accessibility requirements.
Univers. Access Inf. Soc., 12(4):427–439.
Moran, K., Bernal-C
´
ardenas, C., Curcio, M., Bonett, R.,
and Poshyvanyk, D. (2020). Machine learning-based
prototyping of graphical user interfaces for mobile
apps. IEEE Transactions on Software Engineering,
46(2):196–221.
Mozilla Foundation (2024). Mozilla Firefox. Web browser,
Version 118.
Nguyen, T. A. and Csallner, C. (2015). Reverse engineer-
ing mobile application user interfaces with remaui (t).
In 2015 30th IEEE/ACM International Conference on
Automated Software Engineering (ASE), pages 248–
259.
OpenAI (2024). Chatgpt (october 2024 version). https://
openai.com/chatgpt. Large language model.
Qian, J., Shang, Z., Yan, S., Wang, Y., and Chen, L. (2020).
Roscript: a visual script driven truly non-intrusive
robotic testing system for touch screen applications.
In Proceedings of the ACM/IEEE 42nd International
Conference on Software Engineering, ICSE ’20, page
297–308, New York, NY, USA. Association for Com-
puting Machinery.
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A.
(2016). You only look once: Unified, real-time object
detection. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 779–
788.
Reiss, S. P. (2014). Seeking the user interface. In Proceed-
ings of the 29th ACM/IEEE International Conference
on Automated Software Engineering, ASE ’14, page
103–114, New York, NY, USA. Association for Com-
puting Machinery.
SeleniumHQ (2024). Selenium WebDriver. Testing frame-
work for web applications.
Tkachenko, M., Malyuk, M., Holmanyuk, A., and Liu-
bimov, N. (2020-2022). Label Studio: Data label-
ing software. Open source software available from
https://github.com/heartexlabs/label-studio.
Wang, C.-Y., Yeh, I.-H., and Liao, H.-Y. M. (2024).
Yolov9: Learning what you want to learn using
programmable gradient information. arXiv preprint
arXiv:2402.13616.
White, T. D., Fraser, G., and Brown, G. J. (2019). Im-
proving random gui testing with image-based wid-
get detection. In Proceedings of the 28th ACM SIG-
SOFT International Symposium on Software Testing
and Analysis, ISSTA 2019, page 307–317, New York,
NY, USA. Association for Computing Machinery.
Xiao, S., Chen, Y., Song, Y., Chen, L., Sun, L., Zhen, Y.,
Chang, Y., and Zhou, T. (2024). UI semantic com-
ponent group detection: Grouping UI elements with
similar semantics in mobile graphical user interface.
Displays, 83(102679):102679.
Yeh, T., Chang, T.-H., and Miller, R. C. (2009). Sikuli:
using gui screenshots for search and automation. In
Proceedings of the 22nd Annual ACM Symposium on
User Interface Software and Technology, UIST ’09,
page 183–192, New York, NY, USA. Association for
Computing Machinery.
Zhang, X., de Greef, L., Swearngin, A., White, S., Murray,
K., Yu, L., Shan, Q., Nichols, J., Wu, J., Fleizach, C.,
Everitt, A., and Bigham, J. P. (2021). Screen recogni-
tion: Creating accessibility metadata for mobile appli-
cations from pixels. In Proceedings of the 2021 CHI
Conference on Human Factors in Computing Systems,
CHI ’21, New York, NY, USA. Association for Com-
puting Machinery.
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
714