Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020).
Yolov4: Optimal speed and accuracy of object
detection. arXiv preprint arXiv:2004.10934.
Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE
international conference on computer vision (pp. 1440-
1448).
GTTempo, Copioni, https://www.gttempo.com/copioni,
accessed 2022.
Huang, Y., Yan, Q., Li, Y., Chen, Y., Wang, X., Gao, L., &
Tang, Z. (2019, September). A YOLO-based table
detection method. In 2019 International Conference on
Document Analysis and Recognition (ICDAR) (pp. 813-
818). IEEE.
Lombardi, F., & Marinai, S. (2020). Deep learning for
historical document analysis and recognition—a
survey. Journal of Imaging, 6 (10), 110.
Neumann, M., Shen, Z., & Skjonsberg, S. (2021). PAWLS:
PDF Annotation With Labels and Structure. arXiv
preprint arXiv:2101.10281.
Pondenkandath, V., Seuret, M., Ingold, R., Afzal, M. Z., &
Liwicki, M. (2017, November). Exploiting state-of-the-
art deep learning methods for document image analysis.
In 2017 14th IAPR International Conference on
Document Analysis and Recognition (ICDAR) . (Vol. 5,
pp. 30-35),. IEEE.
Redmon, J., & Farhadi, A. (2017). YOLO9000: better,
faster, stronger. In Proceedings of the IEEE conference
on computer vision and pattern recognition (pp. 7263-
7271).
Redmon, J., & Farhadi, A. (2018). Yolov3: An incremental
improvement. arXiv preprint arXiv:1804.02767.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016).
You only look once: Unified, real-time object detection.
In Proceedings of the IEEE conference on computer
vision and pattern recognition (pp. 779-788).
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn:
Towards real-time object detection with region
proposal networks. Advances in neural information
processing systems, 28, 91-99.
Smith, L. N. (2018). A disciplined approach to neural
network hyper-parameters: Part 1--learning rate, batch
size, momentum, and weight decay. arXiv preprint
arXiv:1803.09820.
Soto, C., & Yoo, S. (2019, November). Visual detection
with context for document layout analysis. In
Proceedings of the 2019 Conference on Empirical
Methods in Natural Language Processing and the 9th
International Joint Conference on Natural Language
Processing (EMNLP-IJCNLP) ,( pp. 3464-3470).
Yang, H., & Hsu, W. H. (2021, January). Vision-Based
Layout Detection from Scientific Literature using
Recurrent Convolutional Neural Networks. In 2020
25th International Conference on Pattern Recognition
(ICPR), (pp. 6455-6462,). IEEE.
Zhong, X., Tang, J., & Yepes, A. J. (2019, September).
Publaynet: largest dataset ever for document layout
analysis. In 2019 International Conference on
Document Analysis and Recognition (ICDAR) (pp.
1015-1022). IEEE.
Ziran, Z., Marinai, S., & Schoen, F. (2019) Deep learning-
based object detection models applied to document
images. UniFi.