Towards Good Practices for Collaborative Development of ML-Based

Systems

Cristiana Moroz-Dubenco

, Bogdan-Eduard-M

alin Mursa

and M

aty

as Kuti-Kresz

acs

Faculty of Mathematics and Computer Science, Babes¸-Bolyai University, Cluj Napoca, Romania

Keywords:

Collaborative Development, Machine Learning Systems, Automatization.

Abstract:

The ﬁeld of Artiﬁcial Intelligence (AI) has rapidly transformed from a buzzword technology to a fundamental

aspect of numerous industrial software applications. However, this quick transition has not allowed for the

development of robust best practices for designing and implementing processes related to data engineering,

machine learning (ML)-based model training, deployment, monitoring, and maintenance. Additionally, the

shift from academic experiments to industrial applications has resulted in collaborative development between

AI engineers and software engineers who have reduced expertise in established practices for creating highly

scalable and easily maintainable processes related to ML models. In this paper, we propose a series of good

practices that have been developed as the result of the collaboration between our team of academic researchers

in AI and a company specializing in industrial software engineering. We outline the challenges faced and

describe the solutions we designed and implemented by surveying the literature and deriving new practices

based on our experience.

1 INTRODUCTION

Due to recent advances in Artiﬁcial Intelligence (AI),

which used to be considered an academic topic, there

is a surge in demand for integrating AI and, espe-

cially, machine learning (ML) capabilities into soft-

ware products (Makridakis, 2017; Zhang and Lu,

2021). However, there is no standard between the de-

velopment paradigms used in AI and Software Engi-

neering (SE) (Lorenzoni et al., 2021; Cerqueira et al.,

2022). If traditional software systems are built deduc-

tively, by translating the rules that control the system

behaviour into code, ML techniques learn these rules

in a training process, generating the requirements in

an inductive manner (Khomh et al., 2018).

What is more, the process of integrating ML com-

ponents into production-ready applications requires

not only a high understanding of those components

but also robust engineering mechanisms to guarantee

their availability and scalability. Although the sci-

entiﬁc literature highlights the necessity of standard-

ized training and deployment processes, there is in-

sufﬁcient literature to guide ML practitioners (Serban

https://orcid.org/0009-0008-7672-6453

https://orcid.org/0000-0002-4221-7297

https://orcid.org/0009-0004-4997-2000

et al., 2020).

Our goal is to ﬁnd a set of good practices for the

deployment process based on our partnership with a

well-established company that required academic ex-

pertise for integrating AI models into their software

products, regardless of the type of ML model in-

volved. As a means to this, we analyze the challenges

our team of researchers has encountered and the pro-

cesses and procedures employed to overcome each of

these challenges.

The rest of the paper is structured as follows: in

Section 2 we analyze the process of taking ML mod-

els from theory to practice and the state-of-the-art re-

lated to the problem; subsequently, in Section 3, we

examine the challenges we faced and the solutions

developed, and we also analyze the human aspects in-

volved by presenting the results of a survey conducted

within our industry partners; in Section 4 we propose

a set of good practices derived from our experiences

and in Section 5 we analyze possible threats to valid-

ity; and, ﬁnally, in Section 6 we present our conclu-

sions and ideas for future work.

604

Moroz-Dubenco, C., Mursa, B. and Kuti-KreszÃ ˛acs, M.

Towards Good Practices for Collaborative Development of ML-Based Systems.

DOI: 10.5220/0012130500003538

In Proceedings of the 18th International Conference on Software Technologies (ICSOFT 2023), pages 604-611

ISBN: 978-989-758-665-1; ISSN: 2184-2833

 2023 by SCITEPRESS – Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)

2 TRAINING MODELS - FROM

THEORY TO PRACTICE

Training artiﬁcial intelligence models has become a

demanding task in recent times, as there were no es-

tablished norms or frameworks to facilitate the stan-

dardization of the training procedure in the past. This

resulted in customized processes that often felt like

academic procedures, which were difﬁcult for the in-

dustry to replicate on a large scale. As a result, there

are various logistical challenges that necessitated im-

mediate collaboration between academic and industry

teams.

In the upcoming subsections, we will examine the

challenges encountered and the solutions developed

by our team of researchers in partnership with a well-

established company that required external expertise

for a project centered around a series of AI models.

2.1 Problem Deﬁnition

The scientiﬁc literature has highlighted the recent ne-

cessity of standardizing the training and deployment

process of AI models in new or existing software ap-

plications, employing robust mechanisms to ensure

their high availability and scalability.

Breaking down the training process is a complex

task that requires a meticulous analysis of each atomic

component in the pipeline that collectively delivers

the model. The pipeline can be broadly and simply

described as consisting of the following stages (Ash-

more et al., 2021): (1) data modeling (2) model train-

ing (3) model deployment (4) model maintenance.

The process of model training begins with data

modeling, which encompasses data gathering, prepro-

cessing, and engineering. Each of these subtasks is

critical, as any issues encountered in this stage can

compromise subsequent steps that rely on the data.

Challenge 1: Establishing a data delivery protocol

- The lack of standardization of the data delivery

routine was crucial to our experiments in the model

training phase. We encountered various issues, rang-

ing from missing columns and improperly formatted

values (e.g., integers represented as strings) to more

complex problems such as excessively large ﬁles, cor-

rupted ﬁles, and inadequate version control.

Challenge 2: Training the models - After preparing

the dataset, we faced several challenges during the

model training phase, more exactly training multiple

AI models using the prepared dataset and selecting

the one that demonstrated the best performance.

These challenges were primarily related to re-

source limitations and performance. Training the

models necessitated an environment equipped with

physical resources, libraries, and tools. Additionally,

we had to manage the orchestration of the training to

ultimately identify the most accurate model for de-

ployment in the subsequent stage.

Challenge 3: Deploying the models - While the ﬁrst

two challenges discussed thus far are commonplace in

academic experiments, this last challenge presented

our team with unfamiliar territory due to the need of

using continuous integration and continuous deploy-

ment (CI/CD) frameworks suited for industrial ar-

chitectures, continuous integration (CI) meaning that

when data is updated the ML pipeline reruns generat-

ing new models, evaluation metrics and explanations

while continuous deployment (CD) being the release

of new models into a (pre-)production environment.

Moreover, if the end-user lacks technical capabilities,

it is essential to establish a mechanism that bridges

the gap between the user’s input and the model’s pre-

diction.

In real-world scenarios, it is crucial to have strict

supervision over the deployment package to ensure

smooth maintenance of the model. However, opera-

tional challenges arise as the responsibility of main-

taining the model falls on a team that may lack ex-

pertise in AI. This adds to what we have called Chal-

lenges related to human factor, which are discussed

in Section 3.4. In the event of a model failure in a

live environment, the responsibility of maintenance

falls on a team that typically has extensive expertise in

DevOps but reduced knowledge in AI. This presents

a signiﬁcant issue due to the black-box nature of AI

models, particularly as they grow in complexity, such

as with neural networks. As a result, our team faced

the problem of ﬁnding a solution to this issue and im-

plementing protocols to facilitate maintenance in case

of real-time failures.

2.2 State-of-the-Art

Serban A. et al (Serban et al., 2020) conducted an out-

standing study that examines the extensive range of

optimal practices suitable for teams and their struc-

tures. The study is a comprehensive case analysis

that scrutinizes all the good practices available at each

stage of AI model training. A quiz derived from the

study was subsequently distributed to corporations ac-

tively involved in AI-based applications in the indus-

try. The study ﬁndings indicate that there is no univer-

sally applicable set of optimal practices for AI model

training; rather, their usage is context-dependent and

inﬂuenced by factors such as the management frame-

work (Agile, Waterfall, etc. (Beck, 2023; Fair, 2012)),

team structure, market conditions, project size, and

others. Statistical analyses carried out as part of the

Towards Good Practices for Collaborative Development of ML-Based Systems

605

study revealed that there is a direct correlation be-

tween team growth, resource availability, maturity,

and the adoption of good practices and standardiza-

tion. Speciﬁcally, as teams expand and mature, there

is a corresponding increase in the rate of optimal prac-

tices employed and a greater degree of standardiza-

tion observed.

With respect to the challenges associated with

establishing standardized and reproducible processes

for AI model training, deployment, and maintenance,

a study of the challenges identiﬁed through discus-

sions with several Microsoft teams (Amershi et al.,

2019) describe the discovery, administration, and ver-

sion control of data pertinent to ML applications as

posing a greater challenge compared to other forms

of software engineering as they can display intricate

interdependence and exhibit non-monotonic error be-

havior.

Although discussions surrounding the establish-

ment of optimal practices for the training, utilization,

and maintenance of AI models typically focus on the

procedural aspects of these processes, a separate cate-

gory of pragmatic concerns also emerges through the

level of standardization and safety checks incorpo-

rated into a process increases, there is often a corre-

sponding increase in the amount of time required for

the process to reach completion.

To address this issue, Zhang A. et al. (Zhang et al.,

2017) conducted an experiment aimed at investigating

the extent to which the training procedures associated

with various AI models contribute to their relative

time requirements. Based on their ﬁndings, the re-

searchers proposed a novel framework called SLAQ,

which is a scheduling system designed to prioritize

quality in the context of large-scale ML training tasks

executed within shared computing clusters.

In the upcoming sections, we will provide an

overview of our research team’s experience collabo-

rating with a software company in the domain of AI.

Leveraging the current state-of-the-art literature, we

devised a pipeline of procedural and logistical steps

aimed at proposing a scalable process for training

AI models and deploying them in a production envi-

ronment, while prioritizing ease of maintenance and

monitoring for a team without previous expertise in

the ﬁeld of AI.

3 ADDRESSING CHALLENGES

IN THE DEVELOPMENT OF

ML-BASED SOLUTIONS

3.1 Data Processing

In our collaboration with the software company team,

we needed to establish the protocol for receiving

datasets, which may seem straightforward. However,

even this simple process presented signiﬁcant chal-

lenges that greatly affected the productivity of our ex-

periments.

Challenge 1.1: Finding an all-in-one solution. To

meet the requirements for a data management plat-

form with features such as warehouse storage for ef-

ﬁcient dataset upload/download operations and ﬁle

content validation, we sought a comprehensive solu-

tion that would address all of these issues. Our search

led us to Microsoft AzureML, which proved to be an

ideal candidate for our needs, as discussed in the pre-

vious section.

Challenge 1.2: Integration of new dataset versions

during ongoing experiments. We utilized Microsoft

AzureML blob storage as a repository for all the

datasets provided by our collaborators. The storage

was structured to allow for both horizontal and verti-

cal ﬂexibility, meaning new datasets could be added

at any time, and new versions could be appended

to existing datasets. This allowed our collaborators

to effortlessly integrate new dataset versions at any

point of the experiments without disrupting the pro-

cess. Throughout the experiments, the latest stable

version of the dataset was fetched until a new version

was deemed stable.

Challenge 1.3: Validation of new dataset versions.

Before a new version could be considered stable, it

was subjected to a set of custom rules and valida-

tions that were implemented as scripts. The valida-

tion scripts aim to verify various aspects of the new

version, such as computing the distributions of miss-

ing values by column and checking if the number of

missing values exceeds a threshold. Additionally, the

column names and types are checked against a pre-

deﬁned ﬁle deﬁnition to ensure that the ﬁle structure

is as expected and that training processes will not fail

due to unexpected data types or values. To automate

this process, we set up an AzureML event trigger that

automatically runs the validation scripts upon upload-

ing a new version. The output of the validation script

is then used to label the version as either ”stable” or

”not stable”.

ICSOFT 2023 - 18th International Conference on Software Technologies

606

3.2 Model Training

Compared to data modeling, which is a common step

in any AI model training process, the model training

step is challenging when it comes to understanding

why and how a model behaves during training, es-

pecially for more complex models that tend to be-

have like black boxes. Our project focused on two

types of models - decision trees (Breiman et al., 1984)

and neural networks (Hopﬁeld, 1982) - for forecasting

various indicators.

Challenge 2.1: Understanding the models’ be-

haviours. Decision trees are mathematical models

that can be reverse-engineered to understand their

training accuracy or result. However, neural networks

are much more complex, composed of thousands of

operations between their neurons, making it close to

impossible for a human mind to replicate their behav-

ior, especially for individuals without expertise in ar-

tiﬁcial intelligence. Analyzing the weights, biases,

and gradients of a neural network’s layers to under-

stand its behavior during training requires advanced

skills in AI.

Our primary objective was to create an interface

that would eliminate the need for advanced analysis

and rely solely on a qualitative assessment of the re-

sults. We achieved this by implementing standard

metrics for each trained model that can be numer-

ically compared (Figure 1), thus shifting the focus

from comprehending the intricate architecture of the

black box model to a qualitative comparison of the

trained models.

Figure 1: General description of the proposed pipeline for

AI model training.

Challenge 2.2: Setting up the training process. We

have established a training pipeline in AzureML by

utilizing pre-deﬁned architectures for the aforemen-

tioned models. The output of this pipeline is a re-

port consisting of Mean Squared Error (MSE), Root

Mean Squared Error (RMSE), and R-Squared (R2)

values (Kane, 2013), along with the identiﬁcation of

the model that performed the best based on at least

two out of the three metrics. This approach is re-

garded as elegant by the team without any experience

in AI model training since it requires minimal inter-

vention. Whenever a new stable version of a dataset

is associated with it, the AzureML pipeline automati-

cally initiates and trains a set of models on a platform

maintained by AzureML. Following the series of tests

performed to ensure that the best model is selected, it

returns the best model to be used.

Challenge 2.3: Creating an environment for mod-

els’ training. Although there are differences in the

required libraries and resources for each model, we

have standardized these variations by setting up the

models as AzureML pipeline modules. This one-time

setup eliminates the need for external intervention.

3.3 Model Deployment

Our aim was to ﬁnd a set of good practices for the de-

ployment process for the trained models, irrespective

of the type of model that performed the best during

the training stage, such that the trained models can be

deployed and consumed by third parties without the

need for technical knowledge of the models used.

Challenge 3.1: Encapsulation of the models for de-

ployment. We utilized state-of-the-art technologies

commonly used for deploying software applications,

such as Docker (Merkel, 2014). Docker is a frame-

work that enables the encapsulation of an application

and its dependencies, including operating systems, li-

braries, and other tools, as a container. These contain-

ers can be easily deployed using orchestration frame-

works like Kubernetes.

Challenge 3.2: Communication with the models. Fol-

lowing a discussion with our collaborator’s engineer-

ing team, we concluded that the optimal way of

communicating with the trained model was through

HTTPS requests of a REST API. AzureML provides

a built-in feature to deploy any trained AI model as a

REST web service that is automatically encapsulated

as a container and deployed on AzureML infrastruc-

ture. Additionally, the web service endpoint interface

created for prediction consumption is documented us-

ing Swagger (Swagger, 2022), a cutting-edge frame-

work that offers a user-friendly interface to describe

the structure of REST APIs and generates documenta-

Towards Good Practices for Collaborative Development of ML-Based Systems

607

Figure 2: Architecture of the generic process implemented as an AzureML pipeline.

tion that is easily understandable for both developers

and non-developers.

Challenge 3.3: Ensuring that the deployed models

are easy to use. The approach we employed guaran-

tees a high level of abstraction, which is reduced to a

straightforward input/output protocol. As a user, you

are aware of the expected input and output data struc-

tures when consuming the latest version of a trained

model. Furthermore, the deployment is performed

in versions, enabling quick rollback to any previous

version in the event of unexpected behavior. Since

we are dealing with a web service-based architecture,

AzureML offers a built-in option for scaling the num-

ber of instances to meet production loads (Figure 2).

We have created an ecosystem that, despite being

part of the project architecture, provides a continuous

sense of autonomy and reliability as a third party.

3.4 Human Aspects in Model

Development

Know-how transfer has been designed as part of the

project unfolding. Based on the development team

assessment, the academic team designed several ac-

tivities aiming to increase the knowledge in the do-

main of Artiﬁcial Intelligence in general, respectively

ML in particular, from both theoretical and applied

perspectives. These activities include presentations,

tutorials, collaborative working exercises, Q&A ses-

sions, together with weekly technical meetings.

In order to evaluate the knowledge transfer pro-

cess and to assess the acquired competencies of the

development team, we investigated through a survey

the apprehension of the development team. We col-

lected 16 answers to this survey. The participants

represent positions in the project ranging from junior

programmers and programmers to business analysts,

lead developers, and database administrators, with an

average experience of 10.1 years (ranging from expe-

rience ranging from less than a year to more than 20

years).

Despite the initial level of AI knowledge in the

team, being close to non-existing, the survey results

presented in Figure 3 indicate promising outcomes.

The perspectives of each team member on how each

component works, starting from data processing steps

such as parameter selection to model understand-

ing/implementation, training, and deployment using

the proposed CI/CD pipelines, were assessed. The re-

sults suggest that each team member’s understanding

of these components and their workings beyond their

black box implementation is more than promising in

most categories. Based on the survey results, the level

of understanding of the proposed AI model training

process was mostly evaluated as medium. This level

of knowledge is deemed sufﬁcient for the mainte-

nance and minor improvement of each step in the pro-

cess.

As anticipated, the evaluation results demonstrate

that neural networks, as a family of AI models, are

more challenging to comprehend compared to deci-

sion trees across all categories. Nonetheless, the en-

couraging aspect is that the ratio does not exclusively

label neural networks as models that cannot be ac-

commodated in the proposed pipeline. Instead, the

results illustrate that the proposed pipeline is well ab-

stracted and can support models with diverse com-

plexities in terms of architectures, ranging from sim-

pler to more complex.

The survey results highlight an interesting ﬁnding:

despite the team’s limited level of understanding, they

exhibit high levels of trust in the deployed models. In-

terestingly, the trust increases with the complexity of

the models, as neural networks were found to have

higher trust than decision trees. This observation sug-

gests that the knowledge-sharing process has been ef-

fective, as the developers seem to have understood

that the more complex architectures yield higher ac-

curacy in the models.

Based on the survey results overviewed in the pre-

ICSOFT 2023 - 18th International Conference on Software Technologies

608

(a) Model understanding. (b) Model training. (c) Model implementation.

(d) Parameter selection. (e) CI/CD. (f) Trust.

Figure 3: People’s perceptions over aspects related to decision trees and neural networks.

vious paragraphs, it can be concluded that the pro-

posed generic framework for the AI model training

process is successful in achieving its intended goal.

The survey showed that team members with limited

knowledge in the ﬁeld of AI were able to understand

and trust the proposed pipeline, indicating that the

framework is well abstracted and easily applicable to

a real-world scenario. The pipeline was able to ac-

commodate both simple and complex AI models, in-

cluding neural networks, demonstrating its versatil-

ity and scalability. These ﬁndings validate the effec-

tiveness of the proposed framework in simplifying the

process of AI model training, enabling software engi-

neering teams to adopt AI technology in a simple and

smooth manner.

4 TOWARDS PROPOSING GOOD

PRACTICES

The insights gained from this collaboration were

shaped by the disparity between an experimental un-

dertaking and an industrial enterprise. Notably, nu-

merous industrial applications in the realm of com-

puter science have their roots in academia and have

evolved into implementations that adhere to princi-

ples that impact the economic aspects of the business.

Marketing a product to a customer entails a set of obli-

gations that become a compulsory part of the logistics

and maintenance of the said product.

One of the demanding experiences encountered by

our academic team was to conform to these princi-

ples and transform our experimental methodologies

into ones that could be integrated into scalable pro-

cesses, which could be effectively maintained within

state-of-the-art software architectures. The most valu-

able insights acquired during this collaboration are

those that we think could serve as a foundation for

establishing a set of good practices. The hurdles en-

countered, ranging from data modeling to model de-

ployment, were overcome through the utilization of

state-of-the-art practices, ultimately resulting in the

successful training of accurate models and proposing

an interface for interacting with said models:

• Data Processing

– Utilizing data warehouses that support ver-

sioning to streamline the process of uploading

datasets. It is critical to ensure that the correct

dataset is not lost due to accidental overwriting

by a corrupted one.

– Implementing unit testing principles on

datasets to validate new versions against a set

of rules for each ﬁeld, such as validating ﬁeld

types. This autonomous and automated step

facilitates model training.

• Model Training

– Creating an interface that eliminates the need

for advanced analysis and shifts the focus from

understanding the models’ architecture to a

qualitative comparison. Standard metrics that

can be numerically compared are much easier

to understand and instruct the personnel on than

understanding a black box model’s behavior.

– Implementing AI model architecture as mod-

ules that can be orchestrated based on a com-

mon interface. Platforms like AzureML offer

parallel training capabilities for multiple mod-

els, enabling the selection of the best model

based on exhaustive analysis (Microsoft, 2023).

• Model Deployment

Towards Good Practices for Collaborative Development of ML-Based Systems

609

– Deploying models as web services to align with

modern software architectures characterized by

interactions between micro-services over the

internet. Our experience suggests that deploy-

ing models as web services is a seamless opera-

tion from the perspectives of infrastructure and

integration. Utilizing containerization princi-

ples to create contained models using technolo-

gies like Docker to facilitate DevOps activities.

• Collaborative Development

– Conducting a series of theoretical sessions, fo-

cused on the applications of AI models and

their architectures. These sessions should be

followed by practical workshops, featuring in-

teractive coding sessions aimed at transforming

the traditional black-box approach into a more

ﬂexible grey-box approach. The primary objec-

tive of this practice is to enable maintenance ac-

tivities that involve lightweight improvements

to existing processes, by providing a better un-

derstanding of the internal workings of AI mod-

els.

Commencing from the data processing procedures,

through to model training and deployment, we under-

went a learning curve of contemporary technologies

and good practices to determine the optimal combina-

tion of tools and frameworks that could accommodate

the requirements of an industrial application infras-

tructure. Given that this domain is nascent, there is no

standardization with regard to proposals. Therefore,

through our collaboration, we developed bespoke so-

lutions that were tailored to meet our speciﬁc needs.

The principal takeaway from our experience was

navigating the learning curve of these technologies,

wherein we recognized a signiﬁcant potential for in-

tegrating them into our scientiﬁc research activities

to augment overall productivity. Platforms such as

AzureML offer an excellent means of developing ex-

perimental pipelines that can execute experiments at

scale, curate and archive datasets, and conveniently

share outcomes across a team of researchers (Mi-

crosoft, 2022).

5 THREATS TO VALIDITY

Internal validity refers to factors that might inﬂuence

the obtained results. In our study, they refer to project

processes, respectively to the survey conducted as an

investigation method for human aspects in model de-

velopment. While it may be difﬁcult to quantify the

validation of our deﬁned practices, we believe that

our in-depth research of the existing literature (Sub-

section 2.2) provided a reliable foundation for the

practices that we designed and implemented through-

out the data processing, model training, and model

deployment processes discussed in the previous sec-

tions.

The survey was constructed and then validated by

members of the research team, not part of the tar-

get population. The biggest threat is represented by

the small number of participants, so the ﬁndings are

formulated as lessons learned from this experiential

study, with no proposed generalization. As mitiga-

tion strategies, we strive to include all members of the

project beneﬁting from the knowledge transfer, with

different roles and experiences.

External validity refers to the generalization of

the ﬁndings. The ﬁndings are not generalized, but

rather formulated as remarks and lessons learned, as

the study is an experience report and does not repre-

sent a general view of the domain. In our assessment,

the proposed good practices represent a suitable level

of formalization comparable to other experiments that

have been studied in the literature. We believe they

can enable more effective collaboration between AI

engineers and software engineers, ultimately leading

to more efﬁcient and scalable development processes.

6 CONCLUSIONS

This paper presents a study based on a collaborative

development project of an ML-based system. The

project involved a team of researchers with exper-

tise in the ﬁeld of Artiﬁcial Intelligence (AI) and a

software engineering company specialized in creat-

ing large-scale industrial applications. Through this

collaboration, the study examines a series of experi-

ences that were gained during the development pro-

cess. The study aims to highlight the challenges en-

countered and the strategies that were employed to ad-

dress them, to provide insights that can inform future

efforts to develop ML-based systems in industrial set-

tings.

Through our collaborative work, we came to real-

ize a gap in the existing literature related to best prac-

tices for deﬁning interfaces of communication and

technical designs for creating autonomous pipelines.

These pipelines are capable of automating the experi-

mental processes involved in data engineering, model

training, model deployment, and their maintenance in

a real-world infrastructure. Our study highlights the

challenges encountered due to this void in the litera-

ture and the strategies that were employed to address

them. Our work aims to contribute to the develop-

ment of more robust and efﬁcient processes for the

ICSOFT 2023 - 18th International Conference on Software Technologies

610

collaborative development of ML-based systems. We

propose a set of good practices for deﬁning interfaces

of communication and technical designs for creating

autonomous pipelines that can help to streamline the

development process and improve the scalability and

maintainability of ML-based systems.

In order to address the challenges encountered

during the collaborative development of our ML-

based system, we structured our difﬁculties under a

series of challenges. We then designed solutions to

these challenges based on other state-of-the-art exper-

iments and practices that exist in the literature. Our

aim was to combine the knowledge gained from a sur-

vey of proposed practices with our own experiences,

to design more general and standardized practices

that could be applied successfully in future projects.

The solutions we proposed were based on careful

consideration of the unique needs and constraints of

our project, as well as the broader context of indus-

trial ML development. We believe that our approach

can help to improve the efﬁciency and effectiveness

of collaborative ML development efforts, while also

contributing to the development of more robust and

scalable ML-based systems.

ACKNOWLEDGEMENTS

This research was partially supported by DataSEER

project, ﬁnanced through POC 2014-2020, Action

1.2.1, by European Commission and National Gov-

ernment of Romania (Project ID: 121004).

The authors express their gratitude to the indus-

trial partner, OPTIMA GROUP SRL, for their collab-

oration and valuable information exchange.

REFERENCES

Amershi, S., Begel, A., Bird, C., DeLine, R., Gall, H., Ka-

mar, E., Nagappan, N., Nushi, B., and Zimmermann,

T. (2019). Software engineering for machine learning:

A case study. In 2019 IEEE/ACM 41st International

Conference on Software Engineering: Software Engi-

neering in Practice (ICSE-SEIP), pages 291–300.

Ashmore, R., Calinescu, R., and Paterson, C. (2021). As-

suring the machine learning lifecycle: Desiderata,

methods, and challenges. ACM Computing Surveys

(CSUR), 54(5):1–39.

Beck, K. (2023). The agile manifesto. agile alliance.

http://agilemanifesto.org/. Accessed: Apr. 10, 2023.

Breiman, L., Friedman, J. H., Olshen, R. A., and Stone,

C. J. (1984). Classiﬁcation and regression trees

(wadsworth, belmont, ca). ISBN-13, pages 978–

0412048418.

Cerqueira, M., Silva, P., and Fernandes, S. (2022). Sys-

tematic literature review on the machine learning ap-

proach in software engineering. American Academic

Scientiﬁc Research Journal for Engineering, Technol-

ogy, and Sciences, 85(1):370–396.

Fair, J. (2012). Agile versus waterfall: approach is right for

my erp project? In Proceedings of Global Congress

2012—EMEA, Marsailles, France. Newtown Square,

PA: Project Management Institute.

Hopﬁeld, J. (1982). Neural networks and physical systems

with emergent collective properties like those of two-

state neurons. Proc. Natl. Acad. Sci.(USA), 79:2554–

2558.

Kane, M. T. (2013). Validating the interpretations and uses

of test scores. Journal of Educational Measurement,

50(1):1–73.

Khomh, F., Adams, B., Cheng, J., Fokaefs, M., and Anto-

niol, G. (2018). Software engineering for machine-

learning applications: The road ahead. IEEE Soft-

ware, 35(5):81–84.

Lorenzoni, G., Alencar, P., Nascimento, N., and Cowan, D.

(2021). Machine learning model development from a

software engineering perspective: A systematic litera-

ture review. arXiv.

Makridakis, S. (2017). The forthcoming artiﬁcial intelli-

gence (ai) revolution: Its impact on society and ﬁrms.

Futures, 90:46–60.

Merkel, D. (2014). Docker: lightweight linux containers for

consistent development and deployment. Linux jour-

nal, 2014(239):2.

Microsoft (2022). Microsoft customer stories. Microsoft

Azure Blog.

Microsoft (2023). Azure machine learning.

https://azure.microsoft.com/en-us/services/machine-

learning/. Accessed: Apr. 10, 2023.

Serban, A., van der Blom, K., Hoos, H., and Visser, J.

(2020). Adoption and effects of software engineer-

ing best practices in machine learning. In Proceed-

ings of the 14th ACM / IEEE International Symposium

on Empirical Software Engineering and Measurement

(ESEM). ACM.

Swagger (2022). Swagger: The world’s most popular

framework for apis. https://swagger.io. Accessed:

Apr. 10, 2023.

Zhang, C. and Lu, Y. (2021). Study on artiﬁcial intelligence:

The state of the art and future prospects. Journal of

Industrial Information Integration, 23:100224.

Zhang, H., Stafman, L., Or, A., and Freedman, M. J. (2017).

Slaq: Quality-driven scheduling for distributed ma-

chine learning. In Proceedings of the 2017 Symposium

on Cloud Computing, SoCC ’17, page 390–404, New

York, NY, USA. Association for Computing Machin-

ery.

Towards Good Practices for Collaborative Development of ML-Based Systems

611