
Christie, M. (2020). Magellan: toward building
ecosystems of entity matching solutions. Communi-
cations of the ACM, 63(8):83–91.
Fellegi, I. P. and Sunter, A. B. (1969). A theory for record
linkage. Journal of the American Statistical Associa-
tion, 64:1183–1210.
Goga, O., Loiseau, P., Sommer, R., Teixeira, R., and Gum-
mandi, K. P. (2015). On the reliability of profile
matching across large online social networks. In On
the Reliability of Profile Matching Across Large On-
line Social Networks, Sydney.
Halmos, P. R. (1960). Naive set theory. van Nostrand.
Huang, J., Ertekin, S., and Giles, C. L. (2006). Efficient
name disambiguation for large-scale databases. In Eu-
ropean conference on principles of data mining and
knowledge discovery, pages 536–544. Springer.
Hubert, L. and Arabie, P. (1985). Comparing partitions.
Journal of classification, 2:193–218.
Jain, R. K. (1991). The Art of Computer Systems Perfor-
mance Analysis: Techniques for Experimental Design,
Measurement, Simulation, and Modeling, volume 1.
Wiley New York, 1 edition.
Jaro, M. A. (1989). Advances in record-linkage methodol-
ogy as applied to matching the 1985 census of tampa,
florida. Journal of the American Statistical Associa-
tion, 84(406):414–420.
K
¨
opcke, H., Thor, A., and Rahm, E. (2009). Comparative
evaluation of entity resolution approaches with fever.
Proceedings of the VLDB Endowment, 2(2):1574–
1577.
K
¨
opcke, H., Thor, A., and Rahm, E. (2010). Evaluation
of entity resolution approaches on real-world match
problems. Proceedings of the VLDB Endowment, 3(1-
2):484–493.
Li, Y., Li, J., Suhara, Y., Doan, A., and Tan, W.-C. (2020).
Deep entity matching with pre-trained language mod-
els. arXiv preprint arXiv:2004.00584.
Maidasani, H., Namata, G., Huang, B., and Getoor, L.
(2012). Entity resolution evaluation measures. Uni-
versity of Maryland, Tech. Rep.
Menestrina, D., Whang, S. E., and Garcia-Molina, H.
(2010). Evaluating entity resolution results. In Evalu-
ating entity resolution results.
Newcombe, H. B., Kennedy, J. M., Axford, S. J., and James,
A. P. (1959). Automatic linkage of vital records. Sci-
ence, 130(3381):954–959.
Obraczka, D., Schuchart, J., and Rahm, E. (2021). Ea-
ger: embedding-assisted entity resolution for knowl-
edge graphs. arXiv preprint arXiv:2101.06126.
Olar, A. (2023). Experiment data. https://github.com/match
escu/experiment-data. Online; accessed 25.10.2023.
Olar, A. (2024). Pyresolvemetrics appendix. https://matche
scu.github.io/py-resolve-metrics/article/02 appendix.
pdf. Online; accessed 07.03.2024.
Papadakis, G., Tsekouras, L., Thanos, E., Giannakopoulos,
G., Palpanas, T., and Koubarakis, M. (2017). Jedai:
The force behind entity resolution. In The Semantic
Web: ESWC 2017 Satellite Events: ESWC 2017 Satel-
lite Events, Portoro
ˇ
z, Slovenia, May 28–June 1, 2017,
Revised Selected Papers 14, pages 161–166. Springer.
paulboosz (2018). entity-resolution-evaluation. https://gith
ub.com/entrepreneur-interet-general/entity-resolutio
n-evaluation/README.md. accessed 2023-09-22.
PyResolveMetrics (2023). Pyresolvemetrics. https://gith
ub.com/matchescu/er-metrics. Online; Accessed:
26.11.2023.
Qian, K., Popa, L., and Prithviraj, S. (2017). Active learning
for large-scale entity resolution. In Active learning for
large-scale entity resolution, pages 1379–1388.
Rand, W. M. (1971). Objective criteria for the evaluation of
clustering methods. Journal of the American Statisti-
cal association, 66(336):846–850.
Sch
¨
utze, H., Manning, C. D., and Raghavan, P. (2008). In-
troduction to information retrieval, volume 39. Cam-
bridge University Press Cambridge.
Talburt, J., Wang, R., Hess, K., and Kuo, E. (2007). An
algebraic approach to data quality metrics for entity
resolution over large datasets. In Information quality
management: Theory and applications, pages 1–22.
IGI Global.
Talburt, J. R. (2011). Entity resolution and information
quality. Elsevier.
University of Arkansas Little Rock, E. (2012). Oyster. http
s://bitbucket.org/oysterer/oyster/src/master/READM
E.md. accessed 2023-09-22.
Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M.,
Reddy, T., Cournapeau, D., Burovski, E., Peterson,
P., Weckesser, W., Bright, J., et al. (2020). Scipy
1.0: fundamental algorithms for scientific computing
in python. Nature methods, 17(3):261–272.
Warrens, M. J. and van der Hoef, H. (2022). Understanding
the adjusted rand index and other partition compari-
son indices based on counting object pairs. Journal of
Classification, 39(3):487–509.
Winkler, W. E. (1990). String comparator metrics and en-
hanced decision rules in the fellegi-sunter model of
record linkage. Non-Journal.
Winkler, W. E. (2014). Matching and record linkage.
WIREs Computational Statistics, 6(5):313–325.
Xiao, C., Wang, W., Lin, X., Yu, J. X., and Wang, G.
(2011). Efficient similarity joins for near-duplicate
detection. ACM Transactions on Database Systems
(TODS), 36(3):1–41.
Yeung, K. Y. and Ruzzo, W. L. (2001). Details of the ad-
justed rand index and clustering algorithms, supple-
ment to the paper an empirical study on principal com-
ponent analysis for clustering gene expression data.
Bioinformatics, 17(9):763–774.
PyResolveMetrics: A Standards-Compliant and Efficient Approach to Entity Resolution Metrics
263