Table 6: Training time comparison between DeepMatcher and Ditto (hh:mm:ss format).
technique/dataset Amazon-Google Beer DBLP-ACM DBLP-Google Walmart-Amazon Total
DeepMatcher 0:07:08 00:00:28 00:12:12 00:27:08 00:10:30 0:57:26
Ditto 0:19:17 0:02:14 0:41:02 1:08:02 0:25:57 2:36:32
(a) Structured data.
technique/dataset Abt-Buy
DeepMatcher 00:09:11
Ditto 00:38:07
(b) Textual data.
DBLP-ACM DBLP-Google iTunes-Amazon Walmart-Amazon Total
0:14:28 0:30:41 0:01:22 0:11:04 0:57:35
0:41:36 1:08:34 0:03:53 0:24:29 2:18:32
(c) Dirty data.
ACKNOWLEDGEMENTS
This work was partially supported by CNPq/Brazil.
REFERENCES
Abedjan, Z., Chu, X., Deng, D., Fernandez, R. C., Ilyas,
I. F., Ouzzani, M., Papotti, P., Stonebraker, M., and
Tang, N. (2016). Detecting Data Errors: Where Are
We and What Needs to Be Done? Proceedings of the
VLDB Endowment, 9(12):993–1004.
Barlaug, N. and Gulla, J. A. (2021). Neural Networks for
Entity Matching: A Survey. ACM Transactions on
Knowledge Discovery from Data, 15(3):52:1–52:37.
Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T.
(2017). Enriching Word Vectors with Subword Infor-
mation. Transactions of the Association for Computa-
tional Linguistics, 5:135–146.
Brunner, U. and Stockinger, K. (2020). Entity Match-
ing with Transformer Architectures – A Step Forward
in Data Integration. In Proceedings of the Interna-
tional Conference on Extending Database Technol-
ogy, pages 463–473.
Clark, K. and Manning, C. D. (2016). Improving Corefer-
ence Resolution by Learning Entity-Level Distributed
Representations. In Proceedings of the Annual Meet-
ing of the Association for Computational Linguistics,
pages 643–653.
Devlin, J., Chang, M., Lee, K., and Toutanova, K. (2019).
BERT: Pre-training of Deep Bidirectional Transform-
ers for Language Understanding. In Proceedings of
the Conference of the North American Chapter of the
Association for Computational Linguistics: Human
Language Technologies, pages 4171–4186.
Ebraheem, M., Thirumuruganathan, S., Joty, S. R., Ouz-
zani, M., and Tang, N. (2018). Distributed Represen-
tations of Tuples for Entity Resolution. Proceedings
of the VLDB Endowment, 11(11):1454–1467.
Elmagarmid, A. K., Ipeirotis, P. G., and Verykios, V. S.
(2007). Duplicate Record Detection: A Survey. IEEE
Transactions on Knowledge and Data Engineering,
19(1):1–16.
Hern
´
andez, M. A. and Stolfo, S. J. (1998). Real-world Data
is Dirty: Data Cleansing and The Merge/Purge Prob-
lem. Data Mining and Knowledge Discovery, 2(1):9–
37.
Konda, P., Das, S., C., P. S. G., Doan, A., Ardalan,
A., Ballard, J. R., Li, H., Panahi, F., Zhang, H.,
Naughton, J. F., Prasad, S., Krishnan, G., Deep, R.,
and Raghavendra, V. (2016). Magellan: Toward
Building Entity Matching Management Systems. Pro-
ceedings of the VLDB Endowment, 9(12):1197–1208.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-
ageNet Classification with Deep Convolutional Neu-
ral Networks. In Proceedings of the Conference on
Neural Information Processing Systems, pages 1106–
1114.
Leone, M., Huber, S., Arora, A., Garc
´
ıa-Dur
´
an, A., and
West, R. (2022). A Critical Re-evaluation of Neural
Methods for Entity Alignment. Proceedings of the
VLDB Endowment, 15(8):1712–1725.
Li, Y., Li, J., Suhara, Y., Doan, A., and Tan, W. (2020).
Deep Entity Matching with Pre-Trained Language
Models. Proceedings of the VLDB Endowment,
14(1):50–60.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D.,
Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov,
V. (2019). RoBERTa: A Robustly Optimized BERT
Pretraining Approach. CoRR, abs/1907.11692.
Mudgal, S., Li, H., Rekatsinas, T., Doan, A., Park, Y., Kr-
ishnan, G., Deep, R., Arcaute, E., and Raghavendra,
V. (2018). Deep Learning for Entity Matching: A De-
sign Space Exploration. In Proceedings of the SIG-
MOD Conference, pages 19–34. ACM.
Newcombe, H., Kennedy, J., Axford, S., and James, A.
(1959). Automatic Linkage of Vital Records. Science,
130(3381):954–959.
Shen, W., Wang, J., and Han, J. (2015). Entity Linking
with a Knowledge Base: Issues, Techniques, and So-
lutions. IEEE Transactions on Knowledge and Data
Engineering, 27(2):443–460.
Thirumuruganathan, S., Li, H., Tang, N., Ouzzani, M.,
Govind, Y., Paulsen, D., Fung, G., and Doan, A.
(2021). Deep Learning for Blocking in Entity Match-
ing: A Design Space Exploration. Proceedings of the
VLDB Endowment, 14(11):2459–2472.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, L., and Polosukhin, I.
(2017). Attention is All you Need. In Proceedings
of the Conference on Neural Information Processing
Systems, pages 5998–6008.
ICEIS 2023 - 25th International Conference on Enterprise Information Systems
254