supervised machine learning to map entity attributes
via dictionaries to predict instance entity types.
´Particularly, this method uses a way to map
attributes from different domains to a common set
regarding people, location and organization types.
They show how to reduce work when using entity
type information as a pre-filter for instance matching
evaluation.
Bernardo et al. (2012) propose an approach to
integrate data from hundreds of spreadsheets
available on the Web. For that, they make a semantic
mapping from data instances of spreadsheets to
RDF/OWL datasets. They use a process that
identifies the spreadsheet domain and associates the
data instances to their respective class according to a
domain vocabulary.
Taheriyan et al. (2014) use a supervised machine
learning technique based on Conditional Random
Fields with features extracted from the attribute
names as part of a process to construct semantic
models. The goal is to map the attributes to the
concepts of a domain ontology and generate a set of
candidate semantic types for each source attribute,
each one with a confidence value. Next, an
algorithm selects the top k semantic types for each
attribute as an input to the next step of the process.
Tonon et al. (2013) propose a method to find the
most relevant entity type given an entity (instance)
and its context. This method is based on collecting
statistics and on the graph structure interconnecting
instances and types. This approach is useful for
searching entity types in the light of search engines.
Comparing these works with ours, in our work
we are interested in identifying the entity types to
give more semantics to RDF generated datasets.
Also, we use a semantic matcher to identify the
vocabulary terms which are associated with the
structural metadata from the converting dataset.
Differently from the presented related works, the
entity type is defined as the one which has the max
number of property occurrences. This is recognized
according to the semantics provided by domain
vocabularies which have been chosen by a DE.
Although there is such dependency, our work may
be used in any data domain.
7 CONCLUSIONS
We presented a data domain-driven approach to
converting semi-structured datasets, particularly in
JSON formats, to RDF. By using the semantics
underlying the domain of the data, it makes the
conversion process less demanding. It attempts to
automate as much of the conversion process by
maintaining a domain alignment composed by
correspondences between the converting metadata
(properties) and the domain terms, and reusing it in
each new conversion process. Also, in order to
enrich the target generated RDF dataset, the object’s
entity types are identified and included in the code.
Accomplished experiments show that our
approach is promising. By using the domain
vocabularies, it is able to produce complete RDF
datasets w.r.t. the original source data. Furthermore,
it identifies in almost 77% the most appropriate
entity type for a given object.
As future work, we intended to extend the
approach and tool to deal with CSV files.
Furthermore, we intend to use the MET recognition
process to assist a coreference resolution task when
integrating some datasets at conversion time.
REFERENCES
Alexe, B., Burdick, D. Hernandez, M., Koutrika, G.,
Krishnamurthy, R., Popa, L., Stanoi, I., and Wisnesky,
R., 2013. High-Level Rules for Integration and
Analysis of Data: New Challenges. In Search of
Elegance in the Theory and Practice of Computation:
Essays Dedicated to Peter Buneman. Springer Berlin
Heidelberg. Pp 36-55.
Bernardo, I. R., Mota, M. S., Santanchè, A., 2012.
Extraindo e Integrando Semanticamente Dados de
Múltiplas Planilhas Eletrônicas a Partir do
Reconhecimento de Sua Natureza. In Proceedings of
Brazilian Symposium on Databases (SBBD 2012):
256-263
BIBO, 2016. Available at
http://lov.okfn.org/dataset/lov/vocabs/bibo. Last
access on December, 2016.
CBO, 2016. Available at http://comicmeta.org/cbo/. Last
access on December, 2016.
David, J., Euzenat, J., Scharffe, F., and Trojahn dos
Santos, C., 2011. The alignment api 4.0. In Semantic
web journal 2 (1): 3–10, 2011.
DBPEDIA, 2016. Available on http://wiki.dbpedia.org/.
Last access on December, 2016.
DC, 2016. Available at
http://dublincore.org/documents/2008/01/14/dcmi-
type-vocabulary/. Last access on December, 2016.
DOAP, 2016. Available on
http://lov.okfn.org/dataset/lov/vocabs/doap. Last
access on December, 2016.
Fanizzi, N., dAmato, C., and Esposito, F. 2012. Mining
linked open data through semi-supervised learning
methods based on self-training. In Proceedings of the
IEEE Sixth International Conference on Semantic
Computing (ICSC), 2012. IEEE, Palermo, Italy, pp.
277–284, 2012.