Figure 3: An Agile Process.
In Figure 1, the end of the data science work, the
”product” that results in the workflow is disseminated
or shared. The result of the data science work is typ-
ically embodied in a program code, that is why in
the process there is the task of ”dissemination”(Guo,
2013). This is what is meant by ”deployment” in
Figure 2. In Figure 3, an agile process method, the
”deployment” starts off from the Results −→ Insight
−→Learn phase.
For once off adhoc decision making single in-
stance of ”deployment may do, but experience says
that vital solutions sought by businesses require the
repeated execution of the ”product”. Note that in all
these workflows, the cyclical nature of some parts of
the DS process. Observe the arrows iterating from
one process to the other. Let us give a concrete ex-
ample from the author’s experience with one of his
clients. Assume we have a transport bus company that
has a fleet of buses used for moving passengers. As-
sume further that the fleet is composed of a couple of
hundred buses operating 24×7 and these vehicles are
getting on in age. The data scientist may be tasked of
producing a system that reports to management which
vehicles are worthy of being repaired or which buses
need to be retired/replaced by brand new ones because
it is more economical to do so. Such a system will not
only run once but will have to run at a minimum, once
a month. Such running obviously involves mathemat-
ical and statistical analysis for sure. In this regard,
there is an important reality embedded in those high-
lighted processes where SE is lurking at the core of
those steps.
All have code iteration editing. If we notice, all of
the lifecycle models have an iteration process where
code is edited and refined. This is true for all of
them (see steps highlighted in red). There is con-
tinued testing and refinement going on in all of the
processes. However, in such refinement work, this is
where knowledge of SE principles will help. If the
code is written right with SE best practices in mind,
it will help the DS trap defects easily, extensively and
efficiently. It will also make the code easier to modify
and so, easier to edit.
A model has a life span and is valid only for a
certain time At a certain point in time, the model has
to be updated because things change in time and the
parameters of the model may have to be checked or
revalidated. Thus, the model may simply need to be
updated and the code made current as well. It may not
be necessary to altogether discard the code, indeed it
may be economical to retain parts of the it for after
all, the client paid for it by way of time and money
already. So for this reason, since it is sometimes pru-
dent to maintain the code, a knowledge of SE will
help the DS produce code geared for maintainability
and perhaps even portability etc. Such knowledge can
only help rather than harm. The fact that code is de-
ployed does not mean the code ends there.
Some have the idea that once the DS gets into the
deployment stage, he/she then passes the result to an
SE who will then integrate it into the production sys-
tem. In an organization which is large and employs an
army of SEs and DS personnel, this might fly. Though
indeed, this setup might work, in general this is not
suitable or not an ideal situation for the client. We can
surely surmise that whether a client is from a small
or large organization, the executive using the DS will
prefer to have an integrated final solution from the
data scientist.
Some businesses want a one stop shop of services.
We mean that often, clients do not wish to deal with
several skill providers. Dealing with various mix of
people to get a solution implemented is just too much
of an administrative burden for small or medium busi-
ness operators. Such a scheme will only dissuade ex-
ecutives in embracing the benefits of data science. It
will not work for them because they do NOT enjoy
big project funding, unlike bigger organizations.
We mentioned in the previous section that the
DS is a consultant. As such, small to medium or-
ganizations most likely will hire a DS for a project
with the intention of obtaining a finished and com-
plete decision system. It will be impractical, if not
cost inefficient, if the client obtains a half-finished
product which still has to be turned into industrial
grade/quality code going forward. For this reason,
the DS will give added value to the client if he/she
is knowledgeable of SE principles, i.e., he/she can at
least hand over the code engineered in such a way that
it follows sound SE practices.