STUDY OF CHALLENGES AND TECHNIQUES IN LARGE SCALE MATCHING

Sana Sellami, Aicha-Nabila Benharkat, Youssef Amghar, Rami Rifaieh

Abstract

Matching Techniques are becoming a very attractive research topic. With the development and the use of a large variety of data (e.g. DB schemas, ontologies, taxonomies), in many domains (e.g. libraries, life science, etc), Matching Techniques are called to overcome the challenge of aligning and reconciling these different interrelated representations. In this paper, we are interested in studying large scale matching approaches. We define a quality of Matching (QoM) that can be used to evaluate large scale Matching systems. We survey the techniques of large scale matching, when a large number of schemas/ontologies and attributes are involved. We attempt to cover a variety of techniques for schema matching called Pair-wise and Holistic, as well as a set of useful optimization techniques. One can acknowledge that this domain is on top of effervescence and large scale matching need much more advances. So, we propose a contribution that deals with the creation of a hybrid approach that combines these techniques.

References

  1. Avesani, P., Yatskevich, M., and Giunchiglia, F. (2007). A large scale dataset for the evaluation of matching systems. In 4rd European Semantic Web Conference, ESWC'07.
  2. Bernstein, P. A., Green, T. J., Melnik, S., and Nash, A. (2008). Implementing mapping composition. VLDB J., accepted for publication.
  3. Bernstein, P. A., Melnik, S., Petropoulos, M., and Qui, C. (2004). Industrial-strength schema matching. SIGMOD Record, 33(4):38-43.
  4. Bharadwaj, V., Reddy, Y. V. R., Srinivas, K., Reddy, S., Selliah, S., and Yu, J. (2004). Evaluating adaptability in frameworks that support morphing collaboration patterns. In 13th IEEE International Workshops on Enabling Technologies (WETICE 2004), Infrastructure for Collaborative Enterprises, pages 186-191, Modena, Italy.
  5. Chang, K. C.-C., He, B., and Zhang, Z. (2005). Toward Large Scale Integration: Building a MetaQuerier over Databases on the Web. In CIDR, pages 44-55.
  6. Do, H. H., Melnik, S., and Rahm, E. (2002). Comparison of schema matching evaluations. In Web, Web-Services, and Database Systems, pages 221-237.
  7. Do, H. H. and Rahm, E. (2007). Matching large schemas: Approaches and evaluation. Inf. Syst., 32(6):857-885.
  8. Duchateau, F., Bellahsene, Z., and Hunt, E. (2007). Xbenchmatch: a benchmark for xml schema matching tools. In VLDB, pages 1318-1321.
  9. Grau, B. C., Parsia, B., Sirin, E., and Kalyanpur, A. (2005). Automatic partitioning of owl ontologies using E-connections. In Proceedings of the 2005 International Workshop on Description Logics (DL2005), volume 147, Edinburgh, Scotland, UK,.
  10. He, B. and Chang, K. C.-C. (2003). Statistical schema matching across web query interfaces. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pages 217-228, San Diego, California, USA.
  11. He, B. and Chang, K. C.-C. (2006). Automatic complex schema matching across web query interfaces: A correlation mining approach. ACM Trans. Database Syst., 31(1):346-395.
  12. He, B., Chang, K. C.-C., and Han, J. (2004). Discovering complex matchings across web query interfaces: a correlation mining approach. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 148-157, Seattle, Washington,USA.
  13. Hu, W. and Qu, Y. (2006). Block matching for ontologies. In The Semantic Web - ISWC 2006, 5th International Semantic Web Conference, volume 4273, pages 300-313, Athens, GA, USA.
  14. Hu, W., Zhao, Y., and Qu, Y. (2006). Partition-based block matching of large class hierarchies. In The Semantic Web - ASWC 2006, First Asian Semantic Web Conference, volume 4185, pages 72-83, Beijing, China.
  15. Lu, J., Wang, S., and Wang, J. (2005). An experiment on the matching and reuse of xml schemas. In 5th International Conference, ICWE 2005, pages 273-284, Sydney, Australia.
  16. Madhavan, J., Bernstein, P. A., Doan, A., and Halevy, A. Y. (2005). Corpus-based schema matching. In Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, pages 57-68, Tokyo, Japan.
  17. Madhavan, J., Cohen, S., Dong, X. L., Halevy, A. Y., Jeffery, S. R., Ko, D., and Yu, C. (2007). Web-scale data integration: You can afford to pay as you go. In Proc. Third Biennial Conference on Innovative Data Systems Research(CIDR 2007), pages 342-350, Asilomar, CA, USA.
  18. Melnik, S., Adya, A., and Bernstein, P. A. (2007). Compiling mappings to bridge applications and databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 461-472, Beijing, China.
  19. Pei, J., Hong, J., and Bell, D. A. (2006a). A novel clustering-based approach to schema matching. In Advances in Information Systems, 4th International Conference, ADVIS 2006, volume 4243, pages 60-69, Izmir, Turkey.
  20. Qu, Y., Hu, W., and Cheng, G. (2006). Constructing virtual documents for ontology matching. In WWW, pages 23-31.
  21. Rahm, E. and Bernstein, P. A. (2001). A survey of approaches to automatic schema matching. VLDB J., 10(4):334-350.
  22. Shvaiko, P. and Euzenat, J. (2005). A survey of schema-based matching approaches. Journal on Data Semantics IV, 3730:146-171.
  23. Smiljanic, M., van Keulen, M., and Jonker, W. (2006). Using element clustering to increase the efficiency of xml schema matching. In Proceedings of the 22nd International Conference on Data Engineering Workshops, ICDE 2006, page 45.
  24. Stuckenschmidt, H. and Klein, M. C. A. (2004). Structure-based partitioning of large concept hierarchies. In The Semantic Web - ISWC 2004: Third International Semantic Web Conference, volume 3298, pages 289-303, Hiroshima, Japan.
  25. Su, W., Wang, J., and Lochovsky, F. H. (2006a). Holistic query interface matching using parallel schema matching. In Proceedings of the 22nd International Conference on Data Engineering, ICDE 2006, page 122.
  26. Su, W., Wang, J., and Lochovsky, F. H. (2006b). Holistic schema matching for web query interfaces. In Advances in Database Technology - EDBT 2006, 10th International Conference on Extending Database Technology, pages 77-94.
  27. Wang, G., Rifaieh, R., Goguen, J., Zavesov, V., Rajasekar, A., and Miller, M. (2007). Towards user centric schema mapping platform. In International Workshop on Semantic Data and Service Integration, Vienna, Austria.
  28. Wang, Z., Wang, Y., Zhang, S., Shen, G., and Du, T. (2006). Matching large scale ontology effectively. In The Semantic Web - ASWC 2006, First Asian Semantic Web Conference, volume 4185, pages 99-105, Beijing, China.
  29. Xu, R. and Wunsch, D. (2005). Survey of clustering algorithms. Neural Networks, IEEE Transactions on, 16:645- 678.
Download


Paper Citation


in Harvard Style

Sellami S., Benharkat A., Amghar Y. and Rifaieh R. (2008). STUDY OF CHALLENGES AND TECHNIQUES IN LARGE SCALE MATCHING . In Proceedings of the Tenth International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-8111-36-4, pages 355-361. DOI: 10.5220/0001721903550361


in Bibtex Style

@conference{iceis08,
author={Sana Sellami and Aicha-Nabila Benharkat and Youssef Amghar and Rami Rifaieh},
title={STUDY OF CHALLENGES AND TECHNIQUES IN LARGE SCALE MATCHING},
booktitle={Proceedings of the Tenth International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2008},
pages={355-361},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001721903550361},
isbn={978-989-8111-36-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Tenth International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - STUDY OF CHALLENGES AND TECHNIQUES IN LARGE SCALE MATCHING
SN - 978-989-8111-36-4
AU - Sellami S.
AU - Benharkat A.
AU - Amghar Y.
AU - Rifaieh R.
PY - 2008
SP - 355
EP - 361
DO - 10.5220/0001721903550361