JSON format. In the future, we plan to make this in-
formation available in other formats, including JSON-
LD, RDF, etc., for semantic interoperability.
The MLProvLab tool will be released as an open-
source extension for JupyterLab with an MIT license.
Since it is a work-in-progress tool, we aim to imple-
ment all the features of the provenance management
of end-to-end ML pipeline as discussed in Section 4
in our future work. We could use ML itself to ana-
lyze the work that is being tracked to give informa-
tion about how the performance is or where problems
could emerge. We plan to use logs and logging met-
rics for more information on gathering the provenance
of ML models. We plan to do an extensive user evalu-
ation to understand the user behavior and improve the
tool. We also plan to do a performance-based evalua-
tion with the publicly available notebooks in GitHub.
6 CONCLUSIONS
Jupyter notebooks are widely used by data scientists
and ML practitioners to write simple to complex ML
experiments. Our goal is to provide metadata and
provenance management of the ML pipeline in note-
book code environments. In this paper, we introduced
the design goals and features required for the prove-
nance management of the ML pipeline. Working to-
wards this goal, we introduced MLProvLab, an ex-
tension in JupyterLab, to track, manage, compare, and
visualize the provenance of ML scripts. Through ML-
ProvLab, we efficiently and automatically track the
provenance metadata, including datasets and modules
used. We provide users the facility to compare differ-
ent runs of ML experiments, thereby ensuring a way
to help them make their decisions. The tool helps re-
searchers and data scientists to collect more informa-
tion on their experimentation and interact with them.
This tool is designed so that the user need not to
change their scripts or configure with additional anno-
tations. In our future work, we aim to analyze meta-
data in more detail. We aim to track data sources by
hooking into the file system or the underlying func-
tions in the programming language itself. This will be
integrated in a way that the user experience and per-
formance are not compromised. We plan to use this
provenance information to replay and rerun a note-
book.
ACKNOWLEDGEMENTS
The authors thank the Carl Zeiss Foundation for the
financial support of the project ‘A Virtual Werkstatt
for Digitization in the Sciences (K3)’ within the scope
of the program line ‘Breakthroughs: Exploring Intel-
ligent Systems for Digitization - explore the basics,
use applications’ and University of Jena for the IM-
PULSE funding: IP-2020-10.
REFERENCES
Baker, M. (2016). 1,500 scientists lift the lid on repro-
ducibility. Nature News, 533(7604):452.
Baylor, D., Breck, E., Cheng, H., Fiedel, N., Foo, C. Y.,
Haque, Z., Haykal, S., Ispir, M., Jain, V., Koc, L.,
Koo, C. Y., Lew, L., Mewald, C., Modi, A. N.,
Polyzotis, N., Ramesh, S., Roy, S., Whang, S. E.,
Wicke, M., Wilkiewicz, J., Zhang, X., and Zinkevich,
M. (2017). TFX: A tensorflow-based production-
scale machine learning platform. In Proceedings of
the 23rd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, Halifax, NS,
Canada, August 13 - 17, 2017, pages 1387–1395.
Franz, M., Lopes, C. T., Huck, G., Dong, Y., Sumer, O., and
Bader, G. D. (2016). Cytoscape. js: a graph theory
library for visualisation and analysis. Bioinformatics,
32(2):309–311.
Herschel, M., Diestelk
¨
amper, R., and Ben Lahmar, H.
(2017). A survey on provenance: What for? what
form? what from? The VLDB Journal, 26(6):881–
906.
Hutson, M. (2018). Artificial intelligence faces repro-
ducibility crisis. Science, 359(6377):725–726.
Kery, M. B., John, B. E., O’Flaherty, P., Horvath, A., and
Myers, B. A. (2019). Towards Effective Foraging by
Data Scientists to Find Past Analysis Choices. In
Proceedings of the 2019 CHI Conference on Human
Factors in Computing Systems, pages 1–13, Glasgow
Scotland Uk. ACM.
Kluyver, T., Ragan-Kelley, B., et al. (2016). Jupyter
notebooks-a publishing format for reproducible com-
putational workflows. In ELPUB, pages 87–90.
Namaki, M. H., Floratou, A., Psallidas, F., Krishnan, S.,
Agrawal, A., Wu, Y., Zhu, Y., and Weimer, M.
(2020). Vamsa: Automated provenance tracking in
data science scripts. In Proceedings of the 26th
ACM SIGKDD International Conference on Knowl-
edge Discovery & Data Mining, pages 1542–1551.
Olorisade, B. K., Brereton, P., and Andras, P. (2017). Re-
producibility in machine learning-based studies: An
example of text mining.
Ormenisan, A. A., Ismail, M., Haridi, S., and Dowling, J.
(2020a). Implicit provenance for machine learning ar-
tifacts. Proceedings of MLSys, 20.
Ormenisan, A. A., Meister, M., Buso, F., Andersson, R.,
Haridi, S., and Dowling, J. (2020b). Time travel and
provenance for machine learning pipelines. In Tala-
gala, N. and Young, J., editors, 2020 USENIX Confer-
ence on Operational Machine Learning, OpML 2020,
July 28 - August 7, 2020. USENIX Association.
KDIR 2021 - 13th International Conference on Knowledge Discovery and Information Retrieval
280