to this requirement further improvement of match-
ing precision and recall is required as well. Other-
wise, the cost of integration errors and manual cor-
rection prevails any benefit. To achieve further im-
provements, different concept name matching sys-
tems could be combined by using a majority voting.
Since the approaches we provide in this paper
achieve a lower precision than required for full au-
tomation, we built a recommending engine to inte-
grate research prototypes into industrial applications,
according to the MEDIATION approach (Schreiber
et al., 2017; Schmidts et al., 2018), to assist manual
data integration tasks. Additionally, the approaches
are used in a quality assurance system to aug-
ment possible wrong matches in manually matched
schemas.
REFERENCES
Do, H.-H. and Rahm, E. (2002). COMA: A System
for Flexible Combination of Schema Matching Ap-
proaches. In Proceedings of the 28th International
Conference on Very Large Data Bases, VLDB ’02,
pages 610–621, Hong Kong, China. VLDB Endow-
ment.
Geurts, P., Ernst, D., and Wehenkel, L. (2006). Extremely
randomized trees. Machine Learning, 63(1):3–42.
Madhavan, J., Bernstein, P. A., and Rahm, E. (2001).
Generic schema matching with cupid. In vldb, vol-
ume 1, pages 49–58.
Madhavan, J. et al. (2005). Corpus-Based Schema Match-
ing. In 21st International Conference on Data En-
gineering (ICDE’05), pages 57–68, Tokyo, Japan.
IEEE.
Massmann, S. et al. (2011). Evolution of the COMA match
system. In Proceedings of the 6th International Con-
ference on Ontology Matching-Volume 814, pages 49–
60. CEUR-WS. org.
Melnik, S., Garcia-Molina, H., and Rahm, E. (2002). Sim-
ilarity flooding: a versatile graph matching algorithm
and its application to schema matching. In Proceed-
ings 18th International Conference on Data Engineer-
ing, pages 117–128, San Jose, CA, USA. IEEE Com-
put. Soc.
Monge, A. and Elkan, C. (1997). An Efficient Domain-
Independent Algorithm for Detecting Approximately
Duplicate Database Records.
Philips, L. (2000). The Double Metaphone Search Algo-
rithm. C/C++ Users J., 18(6):38–43.
Raunich, S. and Rahm, E. (2011). ATOM: Automatic
target-driven ontology merging. In 2011 IEEE 27th
International Conference on Data Engineering, pages
1276–1279.
Saleem, K., Bellahsene, Z., and Hunt, E. (2008).
PORSCHE: Performance ORiented SCHEma media-
tion. Information Systems, 33(7):637–657.
Schmidts, O. et al. (2018). Continuously Evaluated Re-
search Projects in Collaborative Decoupled Environ-
ments. In Proceedings of the 5th International Work-
shop on Software Engineering Research and Indus-
trial Practice, SER&IP ’18, pages 2–9, New York,
NY, USA. ACM.
Schreiber, M., Kraft, B., and Zündorf, A. (2017). Metrics
Driven Research Collaboration: Focusing on Com-
mon Project Goals Continuously. In 2017 IEEE/ACM
4th International Workshop on Software Engineering
Research and Industrial Practice (SER IP), pages 41–
47.
Schreiber, M., Kraft, B., and Zündorf, A. (2018). NLP Lean
Programming Framework: Developing NLP Applica-
tions More Effectively. In Proceedings of the 2018
Conference of the North American Chapter of the As-
sociation for Computational Linguistics: Demonstra-
tions, pages 1–5, New Orleans, Louisiana. Associa-
tion for Computational Linguistics.
Vassiliadis, P., Simitsis, A., and Baikousi, E. (2009). A Tax-
onomy of ETL Activities. In Proceedings of the ACM
Twelfth International Workshop on Data Warehousing
and OLAP, DOLAP ’09, pages 25–32, New York, NY,
USA. ACM.
Wang, J., Li, G., and Fe, J. (2011). Fast-join: An efficient
method for fuzzy token matching based string similar-
ity join. In 2011 IEEE 27th International Conference
on Data Engineering, pages 458–469.
Wise, M. J. (1993). String similarity via greedy string tiling
and running Karp-Rabin matching. Online Preprint,
Dec, 119.
Zhang, C. J. et al. (2014). CrowdMatcher: Crowd-assisted
Schema Matching. In Proceedings of the 2014 ACM
SIGMOD International Conference on Management
of Data, SIGMOD ’14, pages 721–724, New York,
NY, USA. ACM.
Zhang, L. et al. (2012). Efficient Online Learning for Large-
Scale Sparse Kernel Logistic Regression. In AAAI.
Schema Matching with Frequent Changes on Semi-Structured Input Files: A Machine Learning Approach on Biological Product Data
215