Instance Based Schema Matching Framework Utilizing Google Similarity and Regular Expression

Osama A. Mehdi, Hamidah Ibrahim, Lilly Suriani Affendey


Schema matching is the task of identifying correspondences between schema attributes that exist in different schemas. A variety of approaches have been proposed to achieve the main goal of high-quality match results with respect to precision (P) and recall (R). However, these approaches are unable to achieve high quality match results, as most of these approaches treated the instances as string regardless the data types of the instances. As a consequence, this causes unidentified matches especially for attribute with numeric instances which further reduces the quality of match results. Therefore, effort still needs to be done to further improve the quality of the match results. In this paper, we propose a framework for addressing the problem of finding matches between schemas of semantically and syntactically related data. Since we only fully exploit the instances of the schemas for this task, we rely on strategies that combine the strength of Google as a web semantic and regular expression as pattern recognition. To demonstrate the accuracy of our framework, we conducted an experimental evaluation using real world data sets. The results show that our framework is able to find 1-1 schema matches with high accuracy in the range of 93% - 99% in terms of precision (P), recall (R), and F-measure (F).


  1. Bellahsene, Z., Bonifati, A. and Rahm, E., 2011. Schema matching and mapping. Springer-Verlag, Heidelberg.
  2. Berlin, J. and Motro, A. 2001. Autoplex: Automated discovery of content for virtual databases. Cooperative Information Systems Springer, pp. 108-122.
  3. Bernstein, P.A., Madhavan, J., and Rahm, E., 2011. Generic schema matching, ten years later. In Proceedings of the VLDB Endowment, vol. 4, no. 11, pp. 695-701.
  4. Bilke, A. and Naumann, F., 2005. Schema matching using duplicates. In Proceedings of the 21st International Conference on Data Engineering (ICDE), IEEE Computer Society, Washington, USA, pp. 69-80.
  5. Blake, R., 2007. A Survey of schema matching research. College of Management Working Papers, University of Massachusetts Boston, Paper 3.
  6. Census 2014, accessed 3 March 2014,< >.
  7. Chua, C.E.H., Chiang, R.H. and Lim, E., 2003. Instancebased attribute identification in database integration. The VLDB Journal, vol. 12, no. 3, pp. 228-243.
  8. Cilibrasi, R. and Vitanyi, P., 2004. Automatic meaning discovery using Google. manuscript, CWI.
  9. Cilibrasi, R.L. and Vitanyi, P.M., 2007. The Google similarity distance. Journal of Knowledge and Data Engineering, IEEE Transactions, Vol. 19, No. 3, pp. 370-383.
  10. Cortez, E., da Silva, A. S., Gonçalves, M. A., and de Moura, E. S., 2010. Ondux: on-demand unsupervised learning for information extraction. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 807-818.
  11. Dai, B.T., Koudas, N., Srivastava, D., Tung, A., and Venkatasubramanian, S., 2008. Validating multicolumn schema matchings by type. Journal of Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on IEEE, pp. 120-129.
  12. De Carvalho, M.G., Laender, A.H., Gonçalves, M.A. and Da Silva, A.S., 2013. An evolutionary approach to complex schema matching. Information Systems, vol. 38, no. 3, pp. 302-316.
  13. Doan, A. and Halevy, A.Y., 2005. Semantic integration research in the database community: A brief survey. AI magazine, vol. 26, no. 1, pp. 83-94.
  14. Doan, A., Domingos, P. and Halevy, A.Y., 2001. Reconciling schemas of disparate data sources: A machine-learning approach. ACM Sigmod Record ACM, pp. 509-520.
  15. Doan, A., Domingos, P., and Levy, A.Y., 2000. Learning Source Description for Data Integration. In Proceedings of the International Workshop on the Web and Databases (WebDB), Dallas, USA, pp. 81- 86.
  16. Euzenat, J., and Shvaiko, P., 2007. Ontology matching. Springer-Verlag, Heidelberg (DE).
  17. Feng, J., Hong, X., and Qu, Y., 2009. An instance-based schema matching method with attributes ranking and classification. In Proceedings of the 6th International Conference on Fuzzy Systems and knowledge Discovery, IEEE Press, NJ, USA, Vol. 5, pp. 522-526.
  18. Friedl, J., 2006. Mastering regular expressions, O'Reilly Media, Inc.
  19. Hai, D., 2007. Schema matching and mapping-based data integration: Architecture, approaches and evaluation. VDM Verlag.
  20. Kang, J., and Naughton, J. F., 2003. On schema matching with opaque column names and data values. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data ACM, pp. 205-216.
  21. Kang, J., and Naughton, J. F., 2008. Schema matching using interattribute dependencies. Journal of Knowledge and Data Engineering, IEEE Transactions on, vol. 20, no. 10, pp. 1393-1407.
  22. Khan, L., Partyka, J., Parveen, P., Thuraisingham, B., and Shekhar, S., 2011. Enhanced Geographically-Typed Semantic Schema Matching. Web Semantics: Science, Services and Agents on the World Wide Web, vol. 9, no. 1, pp. 52-70.
  23. Kleene, S.C., 1951. Representation of events in nerve nets and finite automata. Automata Studies, Princeton University Press, Princeton, NJ, pp. 3-42.
  24. Li, W., and Clifton, C., 1994. Semantic integration in heterogeneous databases using neural networks. VLDB, pp. 1-12.
  25. Li, W. and Clifton, C., 2000. SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Journal of Data & Knowledge Engineering, vol. 33, no. 1, pp. 49-84.
  26. Li, W., Clifton, C. and Liu, S., 2000. Database integration using neural networks: Implementation and experiences. Knowledge and Information Systems, vol. 2, no. 1, pp. 73-96.
  27. Li, Y., Liu, D., and Zhang, W., 2005. Schema matching using neural network. Web Intelligence, 2005. In Proceedings. The 2005 IEEE/WIC/ACM International Conference onIEEE, pp. 743-746.
  28. Liang, Y., 2008. An instance-based approach for domainindependent schema matching. In Proceedings of the 46th Annual Southeast Regional Conference (ACMSE). ACM, New York, USA, pp. 268-271.
  29. Liu, G., Huang, S. and Cheng, Y., 2012. Research on Semantic Integration across Heterogeneous Data Sources in Grid. In Frontiers in Computer Education, Springer Berlin Heidelberg, pp. 397-404.
  30. Mehdi, O.A., Ibrahim, H. and Affendey, L.S., 2012. Instance based Matching using Regular Expression. In Procedia Computer Science, vol. 10, pp. 688-695.
  31. Rahm, E., and Bernstein, P.A., 2001. A survey of approaches to automatic schema matching. The VLDB Journal, vol. 10, no. 4, pp. 334-350.
  32. Restaurant 2014, accessed 3 March 2014,<>.
  33. Shvaiko, P., and Euzenat, J., 2005. A survey of schemabased matching approaches. In Journal on Data Semantics IV Springer, pp. 146-171.
  34. Yang, Y., Chen, M. and Gao, B., 2008. An effective content-based schema matching algorithm. Future Information Technology and Management Engineering, 2008. FITME'08. International Seminar on IEEE, pp. 7-11.

Paper Citation

in Harvard Style

A. Mehdi O., Ibrahim H. and Affendey L. (2014). Instance Based Schema Matching Framework Utilizing Google Similarity and Regular Expression . In Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: DATA, ISBN 978-989-758-035-2, pages 213-222. DOI: 10.5220/0004990102130222

in Bibtex Style

author={Osama A. Mehdi and Hamidah Ibrahim and Lilly Suriani Affendey},
title={Instance Based Schema Matching Framework Utilizing Google Similarity and Regular Expression},
booktitle={Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: DATA,},

in EndNote Style

JO - Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: DATA,
TI - Instance Based Schema Matching Framework Utilizing Google Similarity and Regular Expression
SN - 978-989-758-035-2
AU - A. Mehdi O.
AU - Ibrahim H.
AU - Affendey L.
PY - 2014
SP - 213
EP - 222
DO - 10.5220/0004990102130222