is likely to be un-representative of most research
collaborations. Though the funding bodies, such as
the EJP RD (European Joint Programme – Rare
Diseases) encourage collaborations at distances such
as these, it may still be the case that local relationships
still predominate precisely because of the logistical
challenges involved in transferring data, processing
libraries and hardware over such large distances
(though the same challenges no doubt exist intra-
region too, for instance between individual European
countries, with national infrastructures at varying
stages of development).
This is also a consideration when it comes to the
regional jurisdictions in terms of data sharing laws.
Whilst the security of the data in transit has been
addressed in this proposal, there must be agreement
at a legal level of the usage and privacy laws at each
endpoint of the network. Again, this is a technical
proposal that attempts to be general in scope – but this
is a specific consideration that would always be
relevant in a network like the one proposed here.
Equivalence with the European GDPR legislation is
generally considered the gold-standard in this regard,
and Australia is amongst the various developed
nations that is pursuing this equivalence nationally
(Review of the Privacy Act, 2020).
In terms of generalised re-usability, most – but not
all – aspects are covered in this proposal. It builds
upon the idea presented by GA4GH of a generalised
ML workflow, made accessible by specifying Docker
execution scripts using the YAML specification. The
other prominent feature of sharing and repeatability is
the presentation of all internal data and meta-data in
JSON-LD interfaces, to make the data accessible
according to the FAIR principles. These
standardisations are untested, and it may yet be the
case – even once fully implemented – that their
adoption may be limited. Only the test of time and re-
use will prove this.
It is also the case that providing concrete features
such as commercial CDN usage and hardware for
processing are not easy to generalise. The most that
can be provided is the “gateway” schema descriptions
that allow these to be integrated into a project as
easily as possible. However, the apparently
reasonable cost of CDN usage, does appear to be a
significant step change in the mode of operation of
high-volume data research projects, one that appears
to be generally un-reported. There may be unforeseen
barriers or consequences to the use of these
commercial offerings that will only become apparent
as wider scale usage increases. However, this is an
option that will appear to serve the specific needs of
the Hypox-PD project well in the short- to medium-
term and could perhaps be submitted for
consideration as a step towards a general “Research
CDN”.
7 CONCLUSIONS
A novel mechanism has been presented in this paper,
facilitating the exchange of data, algorithms and
processing when developing a multi-centre
clinical/bioinformatics research project. It has the
potential to significantly improve the feasibility of
transfer of data, analysis and results between
geographically disparate partner nodes, but has
potential limitations of budget (with the content
delivery network), complex orchestration and
synchronisation. As the Hypox-PD project
progresses, the development of this infrastructure will
continue to be reported as an outcome of the research,
additional to the clinical and bioinformatics outputs
of metabolite identification.
ACKNOWLEDGEMENTS
The members of the Hypox-PD consortium
acknowledge the support obtained during the
development of the proposal for the European Joint
Program Rare Diseases (EJP-RD) through the
European Advanced Translational Research
Infrastructure in Medicine (EATRIS).
REFERENCES
Suetake H, Tanjo T, Ishii M et al., (2022), Sapporo: A
workflow execution service that encourages the reuse
of workflows in various languages in bioinformatics.
F1000Research, 11:889
O'Connor BD, Yuen D, Chung V, Duncan AG, Liu XK,
Patricia J, Paten B, Stein L, Ferretti V., (2017), The
Dockstore: enabling modular, community-focused
sharing of Docker-based genomics tools and
workflows. F1000Res. 2017 Jan 18;6:52.
Yuen D., et al, (2021), The Dockstore: enhancing a
community platform for sharing reproducible and
accessible computational protocols, Nucleic Acids
Research, Volume 49, Issue W1, 2 July 2021, Pages
W624–W632
Rehm L. H., et al., (2021), GA4GH: International policies
and standards for data sharing across genomic
research and healthcare, Cell Genomics, Volume 1,
Issue 2, 2021, 100029, ISSN 2666-979X