REFERENCES
Edmonds, J. (1965). Maximum matching and a polyhe-
dron with
0, 1
vertices. J. of Res. the Nat. Bureau of
Standards, 69 B:125–130.
Gao, L., Huang, Y., D
´
ejean, H., Meunier, J.-L., Yan, Q.,
Fang, Y., Kleber, F., and Lang, E. (2019). ICDAR
2019 competition on table detection and recognition
(cTDaR). In Int. Conf. Document Analysis and Recog-
nition (ICDAR), pages 1510–1515.
G
¨
obel, M., Hassan, T., Oro, E., and Orsi, G. (2013). IC-
DAR 2013 table competition. In Int. Conf. Document
Analysis and Recognition (ICDAR), pages 1449–1453.
Hassan, T. and Baumgartner, R. (2007). Table recognition
and understanding from PDF files. In Int. Conf. Doc-
ument Analysis and Recognition (ICDAR), volume 2,
pages 1143–1147.
Hoshen, J. and Kopelman, R. (1976). Percolation and cluster
distribution. I. cluster multiple labeling technique and
critical concentration algorithm. Phys. Rev. B, 14:3438–
3445.
Hulsebos, M., Hu, K., Bakker, M., Zgraggen, E., Satya-
narayan, A., Kraska, T., Demiralp, c., and Hidalgo, C.
(2019). Sherlock: A deep learning approach to seman-
tic data type detection. In ACM SIGKDD Int. Conf.
Knowledge Discovery and Data Mining (KDD), page
1500–1508.
Kleene, S. C. (1951). Representation of events in nerve nets
and finite automata. Technical report, Rand Project Air
Force Santa Monica, CA.
Konya, I. V. (2013). Adaptive Methods for Robust Document
Image Understanding. PhD thesis, University of Bonn,
Germany.
Levenshtein, V. (1966). Binary codes capable of correct-
ing deletions, insertions and reversals. Soviet Physics
Doklady, 10:707.
Miao, H., Gao, J., Mou, Z., Wang, B., Zhang, L., Su, L., Han,
Y., and Luan, Y. (2019). Design, synthesis and biologi-
cal evaluation of 4-piperidin-4-yl-triazole derivatives
as novel histone deacetylase inhibitors. BioScience
Trends, 13(2):197–203.
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013).
Efficient estimation of word representations in vector
space. preprint arXiv:1301.3781.
Nurminen, A. (2013). Algorithmic extraction of data in
tables in PDF documents. Master’s thesis, Tampere
University of Technology.
Paliwal, S. S., D, V., Rahul, R., Sharma, M., and Vig, L.
(2019). TableNet: Deep learning model for end-to-
end table detection and tabular data extraction from
scanned document images. In Int. Conf. Document
Analysis and Recognition (ICDAR), pages 128–133.
Prasad, D., Gadpal, A., Kapadni, K., Visave, M., and Sul-
tanpure, K. (2020). CascadeTabNet: An approach for
end to end table detection and structure recognition
from image-based documents. In IEEE/CVF Conf.
Computer Vision and Pattern Recognition Workshops
(CVPRW), pages 2439–2447.
Rastan, R., Paik, H.-Y., and Shepherd, J. (2015). TEXUS:
A task-based approach for table extraction and under-
standing. In ACM Symp. Document Engineering (Do-
cEng), page 25–34.
Reza, M. M., Bukhari, S. S., Jenckel, M., and Dengel, A.
(2019). Table localization and segmentation using
GAN and CNN. In Int. Conf. Document Analysis and
Recognition Workshops (ICDARW), volume 5, pages
152–157.
Ruffolo, M. and Oro, E. (2009). PDF-TREX: An approach
for recognizing and extracting tables from PDF docu-
ments. In Int. Conf. Document Analysis and Recogni-
tion (ICDAR).
Schreiber, S., Agne, S., Wolf, I., Dengel, A., and Ahmed,
S. (2017). DeepDeSRT: Deep learning for detection
and structure recognition of tables in document im-
ages. In Int. Conf. Document Analysis and Recognition
(ICDAR), pages 1162–1167.
Shigarov, A., Altaev, A., Mikhailov, A., Paramonov, V., and
Cherkashin, E. (2018). TabbyPDF: Web-based system
for PDF table extraction. In Information and Software
Technologies, pages 257–269. Springer.
Silva, A. C. E., Jorge, A., and Torgo, L. (2005). Design of an
end-to-end method to extract information from tables.
Int. J. of Document Analysis and Recognition (IJDAR),
8:144–171.
Yan, C. and He, Y. (2018). Synthesizing type-detection logic
for rich semantic data types using open-source code.
In Int. Conf. Management of Data (SIGMOD), page
35–50.
Zhang, D., Suhara, Y., Li, J., Hulsebos, M.,
C¸
a
˘
gatay Demi-
ralp, and Tan, W.-C. (2020). Sato: Contextual semantic
type detection in tables. preprint arXiv:1911.06311.
APPENDIX
In this section, we present an example of a ground-
truth file from our data set (Figure 9) that we used to
evaluate table interpretation (cf. §6).
[
{
"compound": "9b (IC50;nM)",
"hdac1_ic50": "84.9 \u00b1 25.1",
"hdac6_ic50": "95.9 \u00b1 0.78"
},
{
"compound": "SAHA (IC50;nM)",
"hdac1_ic50": "102.7 \u00b1 5.9",
"hdac6_ic50": "198.5 \u00b1 103.0"
}
]
Figure 9: An example of a ground-truth file from our
collection used in our table interpretation experiment
(11 page07 table0.json).
Flexible Table Recognition and Semantic Interpretation System
37