user interface, the system can be used by a greater
range of users with varying technical skills. Lastly, it
should be noted that the usage of the network is not
limited to analysis tasks. For instance, long-running
services could be run while making use of the uniform
deployment of software on diverse devices which the
system provides. For instance, a workflow could de-
ploy or update a service as the last task of an analysis.
ACKNOWLEDGEMENTS
This work was funded by the Fraunhofer-Cluster of
Excellence »Cognitive Internet Technologies«.
REFERENCES
Afgan, E., Baker, D., van den Beek, M., Blankenberg,
D. J., Bouvier, D., Cech, M., Chilton, J., Clements,
D., Coraor, N., Eberhard, C., Grüning, B. A., Guer-
ler, A., Hillman-Jackson, J., Kuster, G. V., Rasche,
E., Soranzo, N., Turaga, N., Taylor, J., Nekrutenko,
A., and Goecks, J. (2016). The galaxy platform for
accessible, reproducible and collaborative biomedical
analyses: 2016 update. Nucleic acids research, 44
W1:W3–W10.
Berthold, M. R., Cebron, N., Dill, F., Gabriel, T. R., Köt-
ter, T., Meinl, T., Ohl, P., Thiel, K., and Wiswedel,
B. (2009). Knime-the konstanz information miner:
version 2.0 and beyond. AcM SIGKDD explorations
Newsletter, 11(1):26–31.
Buyya, R., Srirama, S. N., Casale, G., Calheiros, R.,
Simmhan, Y., Varghese, B., Gelenbe, E., Javadi, B.,
Vaquero, L. M., Netto, M. A., et al. (2018). A mani-
festo for future generation cloud computing: Research
directions for the next decade. ACM Computing Sur-
veys (CSUR), 51(5):105.
Chen, M., Mao, S., and Liu, Y. (2014). Big data: A survey.
Mobile networks and applications, 19(2):171–209.
Cudahy, G., Flynn, C., Liu, J., Padmos, D., and Wanger, G.
(2016). Digital supply chain: it’s all about that data.
De Filippi, P. and McCarthy, S. (2012). Cloud computing:
Centralization and data sovereignty. European Jour-
nal of Law and Technology, 3(2).
Deelman, E. and Chervenak, A. (2008). Data management
challenges of data-intensive scientific workflows. In
2008 Eighth IEEE International Symposium on Clus-
ter Computing and the Grid (CCGRID), pages 687–
692.
Deng, D., Fernandez, R. C., Abedjan, Z., Wang, S., Stone-
braker, M., Elmagarmid, A. K., Ilyas, I. F., Madden,
S., Ouzzani, M., and Tang, N. (2017). The data civi-
lizer system. In Cidr.
Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P.,
Palumbo, E., and Notredame, C. (2017). Nextflow en-
ables reproducible computational workflows. Nature
biotechnology, 35(4):316–319.
Fillbrunn, A., Dietz, C., Pfeuffer, J., Rahn, R., Landrum,
G. A., and Berthold, M. R. (2017). Knime for re-
producible cross-domain analysis of life science data.
Journal of Biotechnology, 261:149 – 156. Bioinfor-
matics Solutions for Big Data Analysis in Life Sci-
ences presented by the German Network for Bioinfor-
matics Infrastructure.
Gandomi, A. and Haider, M. (2015). Beyond the hype: Big
data concepts, methods, and analytics. International
Journal of Information Management, 35(2):137–144.
Hoque, S., de Brito, M. S., Willner, A., Keil, O., and
Magedanz, T. (2017). Towards container orchestra-
tion in fog computing infrastructures. 2017 IEEE 41st
Annual Computer Software and Applications Confer-
ence (COMPSAC), 02:294–299.
Köster, J. and Rahmann, S. (2012). Snakemake—a scal-
able bioinformatics workflow engine. Bioinformatics,
28(19):2520–2522.
Liu, J., Pacitti, E., Valduriez, P., and Mattoso, M. (2015). A
survey of data-intensive scientific workflow manage-
ment. Journal of Grid Computing, 13(4):457–493.
Miloslavskaya, N. and Tolstoy, A. (2016). Big Data, Fast
Data and Data Lake Concepts. Procedia - Procedia
Computer Science, 88:300–305.
Morabito, R., Kjällman, J., and Komu, M. (2015). Hyper-
visors vs. lightweight virtualization: A performance
comparison. In 2015 IEEE International Conference
on Cloud Engineering, pages 386–393.
Nekrutenko, A. and Taylor, J. (2012). Next-generation
sequencing data interpretation: Enhancing repro-
ducibility and accessibility. Nature reviews. Genetics,
13:667–72.
NISO Press (2004). Understanding metadata. National In-
formation Standards, 20.
Novella, J. A., Emami Khoonsari, P., Herman, S., White-
nack, D., Capuccini, M., Burman, J., Kultima, K.,
and Spjuth, O. (2019). Container-based bioinformat-
ics with pachyderm. Bioinformatics, 35(5):839–846.
Otto, B. and Österle, H. (2016). Corporate Data Quality.
Springer.
Rausch, T., Hummer, W., Muthusamy, V., Rashed, A., and
Dustdar, S. (2019). Towards a serverless platform
for edge AI. In 2nd USENIX Workshop on Hot Top-
ics in Edge Computing (HotEdge 19), Renton, WA.
USENIX Association.
Rezig, E. K., Cao, L., Stonebraker, M., Simonini, G., Tao,
W., Madden, S., Ouzzani, M., Tang, N., and Elma-
garmid, A. K. (2019). Data civilizer 2.0: A holistic
framework for data preparation and analytics. Proc.
VLDB Endow., 12(12):1954–1957.
Shen, J., Li, Y., Zhou, Y., and Wang, X. (2019). Under-
standing i/o performance of ipfs storage: A client’s
perspective. In Proceedings of the International Sym-
posium on Quality of Service, IWQoS ’19, New York,
NY, USA. Association for Computing Machinery.
Stoica, I., Morris, R., Karger, D., Kaashoek, M. F.,
and Balakrishnan, H. (2001). Chord: A scalable
peer-to-peer lookup service for internet applications.
ACM SIGCOMM Computer Communication Review,
31(4):149–160.
A Conceptual Framework for a Flexible Data Analytics Network
233