allowed. Consequently, it is possible to obtain
information structured according to the FHIR
resource data model, and represented in one of these
formats, resulting that this information can be
readable by both humans and machines. Within this
standard, 119 other resources (apart from the patient
resource) are defined at different maturity levels. To
this context, HL7 aims to define and limit the
structures used for the exchange of clinical
information.
Regarding the different coding systems, while a
terminology can refer to several different things, in
healthcare it is associated with the “language” used to
code entries in Electronic Health Records (EHRs)
(Monsen, 2019) including LOINC (LOINC, 2021),
SNOMED-CT (SNOMED-CT, 2021), ICD-10 (ICD-
10, 2021), or ICD-9 (ICD-9, 2021), among others.
Most people encounter medical terminologies at
some point in their lives – whether it is as physicians,
medical purchasers, or patients. In the world of EHRs,
terminology is one of the key parts for achieving real
interoperability between healthcare systems and
integrating their data. For instance, in the case that it
is needed to send data between two systems, for the
data to be usable, these systems must “communicate”
in the same language. This means that the codes from
one system must be compatible with the codes from
the other system. While it can be easy to combine data
from multiple systems in one place, in the case that
these codes cannot be mapped to one another, then the
data remain locked (Mavrogiorgou, 2017). Currently,
there exist several standards. As a result, a lot of
research is performed to map these various
vocabularies so that one can move easily from one to
the other, as long as one of the key ones listed earlier
is used. To this end, there is work that has been done
and is ongoing, such as mappings between ICD-9 and
ICD-10, LOINC and CPT (CPT, 2021), or LOINC
and SNOMED CT.
In this context, it should be noted that medical
information is typically represented following some
specific standards. The SNOMED-CT terminology is
an ontology that defines (some) concepts, such as,
some diseases in terms of their cause, the part of the
body they affect and how they can be diagnosed. It
also includes some food categories, sport categories
or activities of daily living. The Open Biomedical
Ontologies (OBO) consortium (Smith, 2007) is an
initiative trying to integrate the multiple ontologies
developed in the biomedical domain, which also
includes ontologies formalizing patient medical care
and EHRs. The International Classification of
Functioning, Disability and Health (ICF) (Cieza,
2002) is an ontology classifying health and health-
related domains from a body perspective, a personal
activities perspective and a societal perspective. It
classifies according to the body structure (i.e. eye,
ear, digestive systems, etc.), the body function (i.e.
mental, voice, etc.), activities and participations and
the environmental context. Thus, it contains medical
categories as well as some social categories as part of
the activities, participations, and environmental
domains. All concepts are linked to the ICD code in
the ICD terminology. The National Cancer Institute
Thesaurus (NCIT) (Zhe, 2002) is a reference
thesaurus covering biomedical concepts and inter-
concept relationships. As part of that, it also includes
medical categories, categories for physical activities,
social activities and behavioural categories.
However, a major problem is the success of using
ontologies in many domains, as it leads to the
development of many different not necessarily linked
ontologies and taxonomies. This creates in practice
the problem of interoperability, both at the taxonomic
and the semantic levels. To overcome that problem,
major effort is provided from initiatives, such as OBO
and BioPortal (Noy, 2009). It is also the motivation
for the OntoHub (Mossakowski, 2014) repository,
which behind the scenes attempts to utilize alignment
techniques from formal methods for the ontology
domain. The Medical Subject Headings (MeSH)
(Lipscomb, 2000) is a vocabulary maintained by the
US National Library of Medicine (NLM) (Lindberg,
1990). It is a hierarchically organized terminology of
biomedical information contained in NLM database,
including MEDLINE®/PubMed® (Fontelo, 2005). It
is often combined information following the RxNorm
(Liu, 2005), as well as the LOINC standard for
medical laboratory observations. Therefore, the mere
adoption of interoperability standards is not sufficient
to query health data coming from various health data
sources and systems, in a uniform, efficient,
complete, and unambiguous way.
In this manuscript, it is presented the data
integration approach in the form of HHRs that has
been adopted by CrowdHEALTH (Kyriazis, 2017).
CrowdHEALTH is a digital healthcare system aiming
to exploit big data techniques, applied to extended
health records and collective health knowledge (i.e.
clustered records), to evaluate healthcare governance
policies. One of the pillars of the CrowdHEALTH
system is the development and exploitation of the
HHRs. HHRs are intended to provide an integrated
view of the patient, including all health determinants.
Such health-related data may be produced by
different human actors or systems, in different
moments of a patient’s life, and include both i)
medical data, associated with regular patientcare or a