With the development of the CODO ontology, we
aim at supporting the organization and representation
of COVID-19 case data on a daily basis, so that the
produced data can be queried and retrieved
semantically, and can also be taken as an input to
carry out advanced analytics (e.g., trend study,
growth projection). CODO also aims to facilitate the
representation of patient data, the relationships
between patients, between patient and locations,
changes over time, etc. This network data can support
the behaviour analysis of the disease, possible route
of disease spreading, various factors of disease
transmission, etc.
The CODO ontology will also help policymakers.
For example, in analysing how infrastructure was
utilized and where infrastructure could have been
utilized more effectively. Thus, CODO will help deal
with the current pandemic as well as provide a tool to
prepare for future potential crises.
The main contributions described in this paper are:
(i) Describe the CODO ontology. How it was
developed, how it relates to similar projects, how the
ontology can currently be leveraged to support
analysis of COVID-19 data and plans for future work.
(ii) Illustrate the process of automatic data integration
to the ontology.
(iii) Provide examples of how CODO has already
been utilized to analyse data about the pandemic.
The rest of the article is organized as follows:
section 2 describes the background that motivated
development of CODO. Specifically, a survey of
related work, an overview of FAIR principles and
how knowledge graphs can be utilized to provide
technology that implements these principles. Section
3 describes the methodology used to design the
CODO ontology. Section 4 describes the CODO
ontology highlighting some of the significant aspects
of it. Section 5 evaluates the CODO ontology by
automatically loading data on the pandemic and by
describing SPARQL queries that can analyse the data.
Finally, section 6 concludes the paper and discusses
next steps.
2 BACKGROUND
In this section we describe related work that we
surveyed before developing CODO. We also describe
2
https://bioportal.bioontology.org/ontologies/COVID19
3
https://github.com/oeg-upm/drugs4covid19-kg
4
http://covid19.squirrel.link/ontology/
the FAIR principles that were a driving rationale for
our decision to use knowledge graph technology.
2.1 Related Work
Dealing with a global pandemic is a knowledge
intensive process. As a result there have been several
ontologies developed related to the COVID-19
pandemic. Before developing CODO we did a survey
to determine if we could re-use an existing ontology.
We found nine relevant ontologies. However, none of
them were in the same space as what we needed: to
provide a semantic layer on top of case data from
India and the world. We briefly describe some of the
other COVID-19 ontologies in this section. Currently,
we have not found publications for any of them
except for the CIDO ontology (He et al., 2020).
The CIDO ontology (Ontology of Coronavirus
Infectious Disease) is part of the OBO Foundry
Ontology Library. CIDO is focused on analysing
Covid-19 from a medical standpoint. E.g., similarity
to other viruses, common symptoms, drugs that have
been attempted to treat the virus, etc.
COVID-19 Surveillance Ontology
2
is an
application ontology designed to support surveillance
in primary care. The main goal of this ontology is to
support COVID-19 cases and related respiratory
conditions using data from multiple brands of
computerized medical record systems. This work is
partially related to CODO. However, this ontology is
designed as a taxonomy consisting of classes such as
education for COVID-19, exposure to COVID-19,
definite and possible COVID-19, etc. This ontology
does not consist of any properties. This reduces the
semantic expressivity of the ontology.
DRUGS4COVID19
3
defines medications and
their relationships related to COVID-19. Some of the
key classes of the ontology are drug, effect, disease,
symptoms, disorder, chemical substance, etc. OVID-
19
4
is an ontology that consists of classes to enable
the description of COVID-19 datasets in RDF. Some
of the classes of this ontology are Dataset, Dataset of
the Johns Hopkins University, etc.
The World Health Organization’s (WHO)
COVIDCRFRAPID
5
ontology is a semantic data
model for the WHO's COVID-19 RAPID case record
form from 23 March 2020. This model provides
semantic references to the questions and answers of
the form.
5
https://bioportal.bioontology.org/ontologies/COVIDCRF
RAPID