another data source are diminish, which can be seen
as a serious limitation. For this kind of DCOs, the
traditional approach (i.e. specifying DCOs at the
schema level of the concrete data source) is still the
best option. However, our practical experience of
dealing with different data sources from several
domains has allowed us to conclude that: (i) there is
much data cleaning knowledge which can be applied
to different data sources from the same domain (e.g.
detection of business rules violations); (ii) there is
data cleaning knowledge which is so general that
can even be applied to data sources of different
domains (e.g. detection of a syntax violation in an e-
mail address). Hence, the proposed methodology
intends to explore both realities and promote the
reuse of data cleaning knowledge.
ACKNOWLEDGEMENTS
This work is supported by FEDER Funds through
the “Programa Operacional Fatores de
Competitividade - COMPETE” program and by
National Funds through FCT “Fundação para a
Ciência e Tecnologia”.
REFERENCES
Almeida, R., Maio, P., Oliveira, P., João, B., 2015.
Towards Reusing Data Cleaning Knowledge, in: New
Contributions in Information Systems and
Technologies. Springer, pp. 143–150.
Almeida, R., Oliveira, P., Braga, L., Barroso, J., 2012.
Ontologies for Reusing Data Cleaning Knowledge, in:
Semantic Computing (ICSC), 2012 IEEE Sixth Int.
Conf. on. IEEE, pp. 238–241.
Arenas, M., Bertails, A., Prud’hommeaux, E., Sequeda, J.,
2012. A direct mapping of relational data to RDF.
Atzori, L., Iera, A., Morabito, G., 2010. The internet of
things: A survey. Computer networks 54, 2787–2805.
Bellahsene, Z., Bonifati, A., Rahm, E. (Eds.), 2011.
Schema Matching and Mapping. Springer Berlin
Heidelberg, Berlin, Heidelberg.
Booch, G., 1993. Object-Oriented Analysis and Design
with Applications, 2 edition. ed. Addison-Wesley
Professional, Redwood City, Calif.
Brickley, D., Guha, R.V., 2014. RDF Schema 1.1 [WWW
Document]. URL http://www.w3.org/TR/2014/REC-
rdf-schema-20140225/
Codd, E.F., 1970. A relational model of data for large
shared data banks. Communications of the ACM 13,
377–387.
Das, S., Sundara, S., Cyganiak, R., 2012. R2RML: RDB
to RDF mapping language.
Dasu, T., Vesonder, G.T., Wright, J.R., 2003. Data quality
through knowledge engineering, in: Proceedings of the
Ninth ACM SIGKDD Int. Conf. on Knowledge
Discovery and Data Mining. ACM, pp. 705–710.
Fürber, C., Hepp, M., 2011. Towards a vocabulary for data
quality management in semantic web architectures, in:
Proc. of the 1st Int. Workshop on Linked Web Data
Management. ACM, pp. 1–8.
Han, J., Haihong, E., Le, G., Du, J., 2011. Survey on
NoSQL database, in: Pervasive Computing and
Applications (ICPCA), 2011 6th Int. Conf. on. IEEE,
pp. 363–366.
Hashem, I.A.T., Yaqoob, I., Anuar, N.B., Mokhtar, S.,
Gani, A., Khan, S.U., 2015. The rise of “big data” on
cloud computing: review and open research issues.
Information Systems 47, 98–115.
Knuth, M., Sack, H., 2014. Data cleansing consolidation
with PatchR, in: The Semantic Web: ESWC 2014
Satellite Events. Springer, pp. 231–235.
Maedche, A., Motik, B., Silva, N., Volz, R., 2002.
MAFRA—A MApping FRAmework for Distributed
Ontologies in the Semantic Web, in: Workshop on
Knowledge Transformation for the Semantic Web
(KTSW 2002), ECAI. pp. 60–68.
Makris, K., Bikakis, N., Gioldasis, N., Christodoulakis, S.,
2012. SPARQL-RW: Transparent Query Access over
Mapped RDF Data Sources, in: Proc. of the 15th Int.
Conf.on Extending Database Technology, EDBT ’12.
ACM, New York, NY, USA, pp. 610–613.
doi:10.1145/2247596.2247678
McGuinness, D., Harmelen, F. van, 2004. OWL Web
Ontology Language Overview [WWW Document].
URL http://www.w3.org/TR/owl-features/ (accessed
6.11.15).
Milano, D., Scannapieco, M., Catarci, T., 2005. Using
ontologies for xml data cleaning, in: On the Move to
Meaningful Internet Systems 2005: OTM 2005
Workshops. Springer, pp. 562–571.
Obrst, L., Liu, H., Wray, R., 2003. Ontologies for
corporate web applications. AI Magazine 24, 49.
Oliveira, P., Rodrigues, F., Henriques, P., 2009.
SmartClean: An Incremental Data Cleaning Tool, in:
Quality Software, 2009. QSIC’09. 9th Int. Conf. on.
IEEE, pp. 452–457.
Oliveira, P., Rodrigues, F., Henriques, P., Galhardas, H.,
2005a. A taxonomy of data quality problems, in: 2nd
Int. Workshop on Data and Information Quality. pp.
219–233.
Oliveira, P., Rodrigues, F., Henriques, P.R., 2005b. A
Formal Definition of Data Quality Problems., in: IQ.
MIT.
Otero-Cerdeira, L., Rodríguez-Martínez, F.J., Gómez-
Rodríguez, A., 2015. Ontology matching: A literature
review. Expert Systems with Applications 42, 949–
971. doi:10.1016/j.eswa.2014.08.032
Snijders, C., Matzat, U., Reips, U.-D., 2012. Big data: Big
gaps of knowledge in the field of internet science. Int.
Journal of Internet Science 7, 1–5.
Weis, M., Manolescu, I., 2007. Declarative XML data
cleaning with XClean, in: Advanced Information
Systems Engineering. Springer, pp. 96–110.