
from Programming Screencasts. ACM Transactions
on Software Engineering and Methodology, 29 (3).
Bergh, A., Harnack, P., Atchison, A., Ott, J., Eiroa-Lledo,
E., & Linstead, E. (2020). A Curated Set of Labeled
Code Tutorial Images for Deep Learning. 2020 19th
IEEE International Conference on Machine Learning
and Applications(ICMLA).
Fang, S., Xie, H., Wang, Y., Mao, Z., & Zhang, Y. (2021).
Read Like Humans: Autonomous, Bidirectional and
Iterative Language Modeling for Scene Text Recogni-
tion. 2021 IEEE/CVF Conference on Computer Vision
and Pattern Recognition.
Gupta, A., Vedaldi, A., & Zisserman, A. (2016). Syn-
thetic Data for Text Localisation in Natural Images.
2016 IEEE Conference on Computer Vision and Pat-
tern Recognition (CVPR).
He, K., Gkioxari, G., Dollar, P., & Girshick, R. (2017).
Mask R-CNN. IEEE International Conference on
Computer Vision (ICCV).
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual
Learning for Image Recognition. IEEE Conference on
Computer Vision and Pattern Recognition (CVPR).
Huang, Z., Chen, K., He, J., Bai, X., Karatzas, D., Lu, S.,
& Jawahar, C. V. (2019). ICDAR2019 Competition
on Scanned Receipt OCR and Information Extraction.
2019 International Conference on Document Analysis
and Recognition (ICDAR).
Isola, P., Zhu, J.-Y., Zhou, T., & Efros, A. A. (2017).
Image-to-Image Translation with Conditional Adver-
sarial Networks. 2017 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR).
Jaderberg, M., Simonyan, K., Vedaldi, A., & Zisserman, A.
(2016). Reading Text in the Wild with Convolutional
Neural Networks. International Journal of Computer
Vision, 116 (1).
Jaderberg et al., 2014). Deep Features for Text Spotting.
Computer Vision – ECCV 2014.
Li, H., Wang, P., Shen, C., & Zhang, G. (2019). Show,
Attend and Read: A Simple and Strong Baseline for
Irregular Text Recognition. Proceedings of the AAAI
Conference on Artificial Intelligence, 33 (01).
Li, M., Lv, T., Chen, J., Cui, L., Lu, Y., Floren-
cio, D., Zhang, C., Li, Z., & Wei, F. (2023).
TrOCR: Transformer-Based Optical Character Recog-
nition with Pre-trained Models. Proceedings of the
AAAI Conference on Artificial Intelligence, 37 (11).
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B.,
& Belongie, S. (2017). Feature Pyramid Networks
for Object Detection. IEEE Conference on Computer
Vision and Pattern Recognition (CVPR).
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P.,
Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Mi-
crosoft COCO: Common Objects in Context. Com-
puter Vision – ECCV 2014.
Loshchilov, I., & Hutter, F. (2017). SGDR: Stochastic gra-
dient descent with warm restarts. 5th International
Conference on Learning Representations (ICLR).
Lu, N., Yu, W., Qi, X., Chen, Y., Gong, P., Xiao, R., & Bai,
X. (2021). MASTER: Multiaspect non-local network
for scene text recognition. Pattern Recognition, 117.
Malkadi, A., Alahmadi, M., & Haiduc, S. (2020). A Study
on the Accuracy of OCR Engines for Source Code
Transcription from Programming Screencasts. Pro-
ceedings of the 17th International Conference on Min-
ing Software Repositories.
Ott, J., Atchison, A., Harnack, P., Bergh, A., & Linstead,
E. (2018a). A deep learning approach to identify-
ing source code in images and video. Proceedings of
the 15th International Conference on Mining Software
Repositories.
Ott, J., Atchison, A., Harnack, P., Best, N., Anderson, H.,
Firmani, C., & Linstead, E. (2018b). Learning lexical
features of programming languages from imagery us-
ing convolutional neural networks. Proceedings of the
26th Conference on Program Comprehension.
Ponzanelli, L., Bavota, G., Mocci, A., Di Penta, M.,
Oliveto, R., Hasan, M., Russo, B., Haiduc, S., &
Lanza, M. (2016a). Too Long; Didn’t Watch! Extract-
ing Relevant Fragments from Software Development
Video Tutorials. 2016 IEEE/ACM 38th International
Conference on Software Engineering (ICSE).
Ponzanelli, L., Bavota, G., Mocci, A., Di Penta, M.,
Oliveto, R., Russo, B., Haiduc, S., & Lanza, M.
(2016b). CodeTube: Extracting relevant fragments
from software development video tutorials. Proceed-
ings of the 38th International Conference on Software
Engineering Companion.
Ponzanelli, L., Bavota, G., Mocci, A., Oliveto, R., Penta, M.
D., Haiduc, S., Russo, B., & Lanza, M. (2019). Au-
tomatic Identification and Classification of Software
Development Video Tutorial Fragments. IEEE Trans-
actions on Software Engineering, 45 (5).
Smith, R. (2007). An Overview of the Tesseract OCR En-
gine. Ninth International Conference on Document
Analysis and Recognition (ICDAR 2007), 2.
Wang, T.-C., Liu, M.-Y., Zhu, J.-Y., Tao, A., Kautz, J., &
Catanzaro, B. (2018). High-Resolution Image Syn-
thesis and Semantic Manipulation with Conditional
GANs. 2018 IEEE/CVF Conference on Computer Vi-
sion and Pattern Recognition (CVPR).
Yadid, S., & Yahav, E. (2016). Extracting code from
programming tutorial videos. Proceedings of the
2016 ACM International Symposium on New Ideas,
New Paradigms, and Reflections on Programming and
Software.
Yang, C., Thung, F., & Lo, D. (2022). Efficient Search of
Live-Coding Screencasts from Online Videos. 2022
IEEE International Conference on Software Analysis,
Evolution and Reengineering.
Zhao, D., Xing, Z., Chen, C., Xia, X., & Li, G. (2019). Ac-
tionNet: Vision-Based Workflow Action Recognition
From Programming Screencasts. 2019 IEEE/ACM
41st International Conference on Software Engineer-
ing (ICSE).
CodeSCAN: ScreenCast ANalysis for Video Programming Tutorials
277