A Generic and Flexible Framework for Selecting Correspondences in Matching and Alignment Problems

Fabien Duchateau

Abstract

The Web 2.0 and the inexpensive cost of storage have pushed towards an exponential growth in the volume of collected and produced data. However, the integration of distributed and heterogeneous data sources has become the bottleneck for many applications, and it therefore still largely relies on manual tasks. One of this task, named matching or alignment, is the discovery of correspondences, i.e., semantically-equivalent elements in different data sources. Most approaches which attempt to solve this challenge face the issue of deciding whether a pair of elements is a correspondence or not, given the similarity value(s) computed for this pair. In this paper, we propose a generic and flexible framework for selecting the correspondences by relying on the discriminative similarity values for a pair. Running experiments on a public dataset has demonstrated the improvment in terms of quality and the robustness for adding new similarity measures without user intervention for tuning.

References

  1. Aumueller, D., Do, H. H., Massmann, S., and Rahm, E. (2005). Schema and ontology matching with COMA++. In ACM SIGMOD, pages 906-908.
  2. Avesani, P., Giunchiglia, F., and Yatskevich, M. (2005). A large scale taxonomy mapping evaluation. In International Semantic Web Conference, pages 67-81.
  3. Bellahsene, Z., Bonifati, A., and Rahm, E. (2011). Schema Matching and Mapping. Springer-Verlag, Heidelberg.
  4. Bernstein, P. A., Madhavan, J., and Rahm, E. (2011). Generic schema matching, ten years later. PVLDB, 4(11):695-701.
  5. Bilke, A. and Naumann, F. (2005). Schema matching using duplicates. ICDE, 0:69-80.
  6. Bozovic, N. and Vassalos, V. (2008). Two-phase schema matching in real world relational databases. In ICDE Workshops, pages 290-296.
  7. Christen, P. (2008). Febrl -: an open source data cleaning, deduplication and record linkage system with a graphical user interface. In SIGKDD International Conference on Knowledge Discovery and Datamining, KDD'08, pages 1065-1068. ACM.
  8. Cohen, W., Ravikumar, P., and Fienberg, S. (2003). A comparison of string distance metrics for name-matching tasks. In In Proceedings of the IJCAI-2003.
  9. Cruz, I. F., Sunna, W., Makar, N., and Bathala, S. (2007). A visual tool for ontology alignment to enable geospatial interoperability. J. Vis. Lang. Comput., 18(3):230- 254.
  10. Dhamankar, R., Lee, Y., Doan, A., Halevy, A., and Domingos, P. (2004). iMAP: Discovering Complex Semantic Matches between Database Schemas. In ACM SIGMOD, pages 383-394.
  11. Doan, A., Madhavan, J., Dhamankar, R., Domingos, P., and Halevy, A. Y. (2003). Learning to match ontologies on the semantic web. VLDB J., 12(4):303-319.
  12. Drumm, C., Schmitt, M., Do, H. H., and Rahm, E. (2007). Quickmig: automatic schema matching for data migration projects. In CIKM, pages 107-116. ACM.
  13. Duchateau, F., Coletta, R., Bellahsene, Z., and Miller, R. J. (2009). (Not) Yet Another Matcher. In CIKM, pages 1537-1540.
  14. Euzenat, J. et al. (2004). State of the art on ontology matching. Technical Report KWEB/2004/D2.2.3/v1.2, Knowledge Web.
  15. Euzenat, J., Ferrara, A., van Hage, W. R., Hollink, L., Meilicke, C., Nikolov, A., Ritze, D., Scharffe, F., Shvaiko, P., Stuckenschmidt, H., Sváb-Zamazal, O., and dos Santos, C. T. (2011). Results of the ontology alignment evaluation initiative 2011. In OM.
  16. Euzenat, J. and Shvaiko, P. (2007). Ontology matching. Springer-Verlag, Heidelberg (DE).
  17. Fellegi, I. P. and Sunter, A. B. (1969). A theory for record linkage. Journal of the American Statistical Association, 64:1183-1210.
  18. Gracia, J., Bernad, J., and Mena, E. (2011). Ontology matching with cider: evaluation report for oaei 2011. In OM.
  19. Jain, A., Nandakumar, K., and Ross, A. (2005). Score normalization in multimodal biometric systems. Pattern Recognition, 38(12):2270-2285.
  20. Köpcke, H. and Rahm, E. (2010). Frameworks for entity matching: A comparison. Data Knowl. Eng., 69:197- 210.
  21. Kopcke, H., Thor, A., and Rahm, E. (2010). Learning-based approaches for matching web data entities. IEEE Internet Computing, 14(4):23-31.
  22. Li, J., Tang, J., Li, Y., and Luo, Q. (2009). Rimom: A dynamic multistrategy ontology alignment framework. IEEE Trans. on Knowl. and Data Eng., 21(8):1218- 1232.
  23. Panse, F., Ritter, N., and van Keulen, M. (2013). Indeterministic handling of uncertain decisions in deduplication. Journal of Data and Information Quality.
  24. Resnik, P. (1999). Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research, 11:95-130.
  25. Saleem, K. and Bellahsene, Z. (2009). Complex schema match discovery and validation through collaboration. In OTM Conferences (1), pages 406-413.
  26. Shvaiko, P. and Euzenat, J. (2005). A survey of schemabased matching approaches. Journal of Data Semantics IV, pages 146-171.
  27. Shvaiko, P. and Euzenat, J. (2008). Ten challenges for ontology matching. In OTM Conferences (2), pages 1164- 1182.
  28. Talburt, J. R. (2011). Entity Resolution and Information Quality. Elsevier.
  29. Winkler, W. E. (2006). Overview of record linkage and current research directions. Technical report, Bureau of the Census.
Download


Paper Citation


in Harvard Style

Duchateau F. (2013). A Generic and Flexible Framework for Selecting Correspondences in Matching and Alignment Problems . In Proceedings of the 2nd International Conference on Data Technologies and Applications - Volume 1: DATA, ISBN 978-989-8565-67-9, pages 129-137. DOI: 10.5220/0004430401290137


in Bibtex Style

@conference{data13,
author={Fabien Duchateau},
title={A Generic and Flexible Framework for Selecting Correspondences in Matching and Alignment Problems},
booktitle={Proceedings of the 2nd International Conference on Data Technologies and Applications - Volume 1: DATA,},
year={2013},
pages={129-137},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004430401290137},
isbn={978-989-8565-67-9},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Conference on Data Technologies and Applications - Volume 1: DATA,
TI - A Generic and Flexible Framework for Selecting Correspondences in Matching and Alignment Problems
SN - 978-989-8565-67-9
AU - Duchateau F.
PY - 2013
SP - 129
EP - 137
DO - 10.5220/0004430401290137