Through the use of the project tools, the patient can
in fact easily report any exacerbation or event,
moving their usual care from the hospital to their
home and potentially minimizing the number of
hospitalizations adverse events day-by-day.
In this paper, we detail the methodology, and the
framework developed for the identification and the
integration of multiple data sources in the hospital
with the final release of a daily updating data mart
compliant to the common project data model and
specific data quality standards. The data mart serves
as a data source for the ingestion of patient data to a
Fast Healthcare Interoperability Resources (HL7
FHIR) based data repository for further federated
learning and data visualization tasks. Details on the
technical requirements and the implementation of the
RE-SAMPLE platform are beyond the scope of this
work.
The last years, there is growing interest in
exploring the benefits from the re-utilization and the
integration of Electronic Health Records (EHR) in
clinical trials (Kalankesh, 2024) (Nordo, 2019). At
the same time, digital tools are introduced to facilitate
clinical trial management especially during patient
screening and enrolment tasks (Kasahara, 2024). In
this work, we explore both EHR integration for data
collection along with data validation from multiple
sources and dedicated screening and enrolment
applications as a part of the RE-SAMPLE
infrastructure.
Main challenges in data collection in a real-world
setting, such as the hospital environment, are the
heterogeneity of data sources – which need to be
identified and mapped within the hospital, along with
data availability. A description of this problem has
been addressed in other works including (Kwok,
2022) and (Kerkri, 2001), and a description of
different solutions were reported in (Mate, 2015),
where an ontology-based solution is presented, or
(Jayaratne, 2019), where the authors introduce an
open data integration platform across different
sources. The creation of research datasets in such
context remains a challenging problem and often
leads to ad-hoc solutions that are tailored on the
specific Hospital. In COPD research domain, most
works focus on data modelling and disease
characterization problems while few ones focus on
systematic data collection such as the collaborative
approach for the definition of a COPD dataset in a
Healthcare System reported in (Lam, 2023).
To tackle these challenges in RE-SAMPLE
project, a core facility of Fondazione Policlinico
Universitario Agostino Gemelli IRCCS (FPG) named
Gemelli Generator RWD R&D (Damiani et al., 2021)
has developed a dedicated pipeline for data extraction
and data collection with the aim of retrieving all
required information from the different data sources
that are present in the hospital, including internal
tools that support HCPs in managing the prospective
study. The group has a relevant track record in the
creation of research data marts for other pathologies,
such as breast cancer (Marazzi, 2021), heart failure
(D’Amario, 2023), dyslipidemia (Capece, 2024) or
Covid-19 (Murri, 2022). FPG team actively
participated in the definition of both clinical and
technical requirements of the RE-SAMPLE platform.
To this end, a crucial task was the definition of a
common data model (Acebes et al., 2022), that
includes all the clinically relevant variables for
characterizing the health profile of COPD patients. In
fact, several variables are required to capture the
health condition of COPD patients with chronic
complex conditions. Functional scores based on
spirometry measurements, blood samples, along with
six-minute walking tests and Patient-Reported
Outcomes (PROs) on life habits (e.g. smoking) and
symptoms are needed for providing to Health Care
Professionals (HCPs) a complete overview of the
actual health status of the patient.
Data collection in the hospital requires a shared
effort between clinicians and a dedicated technical
team not only for the conduction of regular outpatient
visits but also for the development of data extraction
procedures from the Hospital Information System
(HIS) or the EHR that make hospital data available
for further visualization and modeling tasks.
In the following sections, a description of the
implementation of an ad-hoc solution for the creation
of the RE-SAMPLE data mart in FPG is reported,
along with the results of the deployment and use of
the defined procedures.
2 METHODS
As shown in Figure 1, the creation of the RE-
SAMPLE data mart stems from the need of collecting
clinical and secondary data for all the patients
included in the project. As a first step, screening is
required before asking a patient to join the study. This
step is made via a web-based recruitment app, where
all the inclusion and exclusion criteria are
standardized. Interacting with this tool, HCPs can
understand whether a patient is eligible for the
participation in the study and consequently being
enrolled.