In this work, we present a semantic-based approach
that aims to facilitate usage and reuse of arboviruses
related data and metadata. Semantic technologies are
employed for modelling relevant information by
means of an ontology, which implements the domain
vocabulary. The approach includes a tool, which is
able to convert CSV data into RDF. In order to
verify the usefulness of the converted data, a web
application, which provides arbovirus information
visualization, has also been developed and
evaluated. In addition, some experiments have been
accomplished.
Our contributions are summarized as follows: (i)
we introduce the ARBO ontology; (ii) we propose a
semantic-based approach to convert arbovirus data
into RDF ones; (iii) we present an application, which
provides useful information based on the produced
RDF data; and (iv) we describe accomplished
evaluations w.r.t. the proposed approach.
The remainder of this paper is organized as
follows: Section 2 introduces some background
concepts, a motivating scenario and related work;
Section 3 presents the proposed approach; Section 4
shows some obtained results and describes the
accomplished evaluations; Finally, Section 5 draws
our conclusions and points out some future work.
2 CONCEPTS, SCENARIO AND
RELATED WORK
In this section, we provide some concepts and
recommended practices for sharing data on the Web.
We also provide a motivating scenario and discuss
some related works.
2.1 Data on the Web
The Web has evolved into an interactive information
network, allowing users and applications to share
data on a massive scale. To help matters, the
Semantic Web and the Linked Data principles define
a set of practices for publishing structured data on
the Web aiming to provide an interoperable Web of
Data (Heath and Bizer, 2011). These principles are
based on technologies such as HTTP, URI and the
RDF data model. By using the RDF model, data or
resources are published on the Web in the form of
triples (composed by a subject, a predicate and an
object). Each resource is identified by means of an
URI. In order to achieve this, it is necessary to
convert data, which are originally in other format
(e.g., CSV), to RDF data.
In order to make data available and feasible for
reuse, another semantic web principle is to organize
data in such a way that they can be interpreted and
used meaningfully without human intervention
(Bansal and Kagemann, 2015). This is achieved by
adding data about data, i.e., by adding metadata to
refer semantically the data.
To clarify matters, the World Wide Web
Consortium (W3C) defines some best practices to
facilitate sharing data on the Web (Lóscio et al.,
2017). These best practices cover diverse aspects
related to data publishing and consumption, like data
formats, data access, data identification and
metadata provisioning. One of the recommendations
regards the use of open domain vocabularies in order
to semantically refer the data, when data are
converted to RDF. To this end, it is essential to take
into account the knowledge domain (e.g., “Health”,
“Music”) in which the data exist and choose the
appropriate domain vocabularies. Vocabularies are
usually developed as ontologies, which represent a
formal, explicit specification of a conceptualization
(Gruber, 2009). An ontology provides definitions of
terms in a given data domain as well as the
relationships that link these terms to each other.
Other W3C recommendation regards facilitating
data consumption. In this sense, it is important to
make data available through APIs (Application
Programming Interfaces), developed for such
purpose, especially if data are large, frequently
updated, or highly complex.
2.2 Motivating Scenario
Collecting and integrating data on diseases, such as
arboviruses, become relevant to some specific
applications, particularly in times of their high
incidence in some countries. We have observed the
need of data analytics on these diseases not only by
healthcare agency managers but also by healthcare
professionals. They have to plan and study
preventive measures in order to fight diseases
occurrences and consequences.
Some data on arboviruses are already published
on the Web as open data. Nevertheless, in some
governmental states as ours, there are no open data
portals with such data. In this work, we have
obtained data directly from the state healthcare
agency. As an illustration, excerpts from the
obtained data are depicted in Figure 1.
Lines in Figure 1 represent patients and
occurrences of disease notification (dengue or
chikungunya). For each patient, symptoms (most
columns) are set according to medical anamnesis.