Bayesian Networks for Matcher Composition in Automatic Schema Matching

Daniel Nikovski, Alan Esenther, Xiang Ye, Mitsuteru Shiba, Shigenobu Takayama

Abstract

We propose a method for accurate combining of evidence supplied by multiple individual matchers regarding whether two data schema elements match (refer to the same object or concept), or not, in the field of automatic schema matching. The method uses a Bayesian network to model correctly the statistical correlations between the similarity values produced by individual matchers that use the same or similar information, in order to avoid overconfidence in match probability estimates and improve the accuracy of matching. Experimental results under several testing protocols suggest that the matching accuracy of the Bayesian composite matcher can significantly exceed that of the individual component matchers.

References

  1. E. Rahm, P. A. Bernstein, A Survey of Approaches to Automatic Schema Matching, VLDB Journal, 10:4 2001.
  2. H. H. Do, E. Rahm, COMA - A System for Flexible Combination of Schema Matching Approaches, in Proceedings of the 28th International Conference on Very Large Data Bases (VLDB), 2002.
  3. W. Li, C. Clifton, A Tool for Identifying Attribute Correspondences in Heterogeneous Databases Using Neural Network, Journal of Data and Knowledge Engineering 33: 1, 49-84, 2000.
  4. A. Doan, P. Domingos, and A. Halevy., Learning to Match the Schemas of Databases: A Multistrategy Approach, Machine Learning Journal, no. 50, pp. 279-301, 2003.
  5. S. Bergamaschi, S. Castano, M. Vincini, D. Beneventano, Semantic Integration of Heterogeneous Information Sources, Journal of Data and Knowledge Engineering 36: 3, 215-249, 2001.
  6. H. H. Do, E. Rahm, Matching Large Schemas: Approaches and Evaluation, Journal of Information Systems, Vol. 32, Issue 6, Sep. 2007.
  7. A. H. Doan, P. Domingos, A. Halevy, Reconciling Schemas of Disparate Data Sources: A Machine Learning Approach, SIGMOD 2001.
  8. D. W. Embley, Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration. WIIW 2001.
  9. D. Heckerman, A Tutorial on Learning Bayesian Networks, Journal of Learning in Graphical Models, pp. 301- 354, 2001.
  10. K. Murphy, An Introduction to Machine Learning and Graphical Models, the Intel Workshop on Machine Learning, Sep. 2003.
  11. J. Tang, J. Z. Li, Using Bayesian Decision for Ontology Mapping, Journal of Web Semantics, Vol. 4, Issue 4, Dec. 2006.
  12. Thiesson, B., Accelerated Quantification of Bayesian Networks with Incomplete Data, Proceedings of the Conference on Knowledge Discovery in Data, 1995, pp. 306-311.
  13. Rong Pan, Yun Peng, Zhongli Ding, Belief Update in Bayesian Networks Using Uncertain Evidence, 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06), 2006, pp.441-444.
  14. A. Marie and A. Gal. Managing Uncertainty in Schema Matcher Ensembles. Proceedings of the 1st International Conference on Scalable Uncertainty Management. Washington, DC, October 2007, pp. 60-73.
  15. A. H. Doan, J. Madhavan, R. Dhamankar, P. Domingos, A. Halevy, Learning to Match Ontologies on the Semantic Web, The VLDB Journal 12 (4), 2003, pp. 303- 319.
  16. F. Duchateau, Z. Bellahsene and R. Coletta, A Flexible Approach for Planning Schema Matching Algorithms, OTM Conferences (CooPIS), 2008, pp. 249-264.
  17. F. Duchateau, R. Coletta, Z. Bellahsene, R. J. Miller, Not Yet Another Matcher, Proceedings of CIKM'09, Hong-Kong, China, November 2009, pp. 2079-2080.
  18. Ian H. Witten, Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques, Second Edition, Morgan Kaufmann, 2005.
  19. Berlin, J., A. Motro: Database Schema Matching Using Machine Learning with Feature Selection. CAiSE 2002, pp.452-466.
Download


Paper Citation


in Harvard Style

Nikovski D., Esenther A., Ye X., Shiba M. and Takayama S. (2012). Bayesian Networks for Matcher Composition in Automatic Schema Matching . In Proceedings of the 14th International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-8565-10-5, pages 48-55. DOI: 10.5220/0004001500480055


in Bibtex Style

@conference{iceis12,
author={Daniel Nikovski and Alan Esenther and Xiang Ye and Mitsuteru Shiba and Shigenobu Takayama},
title={Bayesian Networks for Matcher Composition in Automatic Schema Matching},
booktitle={Proceedings of the 14th International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2012},
pages={48-55},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004001500480055},
isbn={978-989-8565-10-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 14th International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - Bayesian Networks for Matcher Composition in Automatic Schema Matching
SN - 978-989-8565-10-5
AU - Nikovski D.
AU - Esenther A.
AU - Ye X.
AU - Shiba M.
AU - Takayama S.
PY - 2012
SP - 48
EP - 55
DO - 10.5220/0004001500480055