A Framework for Adoption of Machine Learning in Industry for

Software Defect Prediction

Rakesh Rana

, Miroslaw Staron

, Jörgen Hansson

, Martin Nilsson

and Wilhelm Meding

Computer Science & Engineering, Chalmers, University of Gothenburg, Gothenburg, Sweden

Volvo Car Group, Gothenburg, Sweden

Ericsson, Gothenburg, Sweden

Keywords: Machine Learning, Software Defect Prediction, Technology Acceptance, Adoption, Software Quality

Acronyms Used — ML: Machine Learning, SDP: Software Defect Prediction, TAM: Technology

Acceptance Model.

Abstract: Machine learning algorithms are increasingly being used in a variety of application domains including

software engineering. While their practical value have been outlined, demonstrated and highlighted in

number of existing studies, their adoption in industry is still not widespread. The evaluations of machine

learning algorithms in literature seem to focus on few attributes and mainly on predictive accuracy. On the

other hand the decision space for adoption or acceptance of machine learning algorithms in industry

encompasses much more factors. Companies looking to adopt such techniques want to know where such

algorithms are most useful, if the new methods are reliable and cost effective. Further questions such as how

much would it cost to setup, run and maintain systems based on such techniques are currently not fully

investigated in the industry or in academia leading to difficulties in assessing the business case for adoption

of these techniques in industry. In this paper we argue for the need of framework for adoption of machine

learning in industry. We develop a framework for factors and attributes that contribute towards the decision

of adoption of machine learning techniques in industry for the purpose of software defect predictions. The

framework is developed in close collaboration within industry and thus provides useful insight for industry

itself, academia and suppliers of tools and services.

1 INTRODUCTION

Testing is an essential activity in software

engineering (Bertolino, 2007), but also one of the

most expensive phase within software development

life cycle with some estimates approximating it to

consume about 50% of time and resources (Harrold,

2000). Software Defect Prediction (SDP) offers one

possible way to make software testing more

effective by making it possible to optimize test

resource allocation, i.e. distributing more effort to

parts (files/modules) that are predicted to be more

prone to defects. The importance of such predictions

is further substantiated by previous research

suggesting applicability of 80:20 rule to software

defects (that is approximately 20% of software files

are responsible for 80% of errors and cost of rework)

(Boehm,1987) (Güneş Koru and Tian, 2003).

Different methods for defect prediction have

been evaluated and used; these can broadly be

classified as traditional (using expert opinions and

regression based approaches) and those based on

machine learning techniques. Methods based on

machine learning offer addition advantage with their

ability to improve their performance through

experience (as more data is made available over

time). Despite the importance of predicting defects in

a software project and demonstrations that SDP using

ML techniques is not too difficult to apply in practice

(Menzies et al., 2003), their adoption and application

by practitioners in industry has been limited which is

apparent from the lack of published experience

reports. Adoption of any complex method/technology

is dependent on several dimensions (Legris et al.,

2003), but most of the earlier studies in SDP have

focused mainly on the aspect of predictive accuracy.

In this paper we argue that our lack of understanding

of other factors relevant to industrial practitioners is a

major reason for low adoption of ML techniques for

SDP in industry.

383

Rana R., Staron M., Hansson J., Nilsson M. and Meding W..

A Framework for Adoption of Machine Learning in Industry for Software Defect Prediction.

DOI: 10.5220/0005099303830392

In Proceedings of the 9th International Conference on Software Engineering and Applications (ICSOFT-EA-2014), pages 383-392

ISBN: 978-989-758-036-9

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

Based on the technology acceptance model

(TAM) and technology adoption frameworks we

develop a framework for explaining the adoption of

ML for SDP in industry. TAM intends to explain

why users’ belief and their attitudes towards a

technology affect their acceptance or rejection of the

information-communication technology. While TAM

is parsimonious and theoretically justified model to

explain information technology adoption (Van der

Heijden, 2003), to use this model for a specific

technology requires identification of detailed

attributes specific to the given technology and

context which collectively explain the belief and

attitude of uses towards the given technology. The

research question we address in this paper is:

“How can we use the technology acceptance

and adoption models for developing framework for

ML adoption in industry and how to adapt it for

software defect prediction?”

2 BACKGROUND AND RELATED

WORK

2.1 Software Defect Prediction Using

Tradition Approaches

Traditional methods used for software defect

prediction and risk assessment can be broadly

categorized under:

 Expert Opinions

 Analogy Based Predictions

 Regression Based Approaches

Statistical approaches based on regression have

also been used for the task of defect prediction. The

dependent (or outcome) variable could be binary

(defective or not defective) as in logistic regression

or the model could be built to predict the number of

expected defects as in case of multiple linear

regression. Logistic regression has been applied in

Khoshgoftaar and Allen (Khoshgoftaar and Allen,

1999) for classifying modules as fault-prone or not.

Zimmermann, Premraj and Zeller (Zimmermann et

al., 2007) also applied Logistic regression to classify

file/packages in Eclipse project as defect prone (has

defect Vs. not has defect) . Multiple linear regression

is used to model software changes (Khoshgoftaar et

al., 1993) as a function of a set of software

complexity metrics. Linear regression was also used

by Khoshgoftaar et al. (Khoshgoftaar et al., 1992) for

predicting program faults in two subsystems of a

general-purpose operating system, where they also

evaluated different fitting criteria’s (namely Least

Squares, Least Absolute Value, Relative Least

Squares and Minimum Relative Error).

2.2 Software Defect Prediction Using

ML Techniques

Broad types of Machine Learning (ML) techniques

used for software defect prediction:

 Decision Trees (DTs)

 Support Vector Machines (SVMs)

 Artificial Neural Networks (ANNs)

 Bayesian Belief Networks (BNNs)

Machine learning algorithms can also be used to

model the software defect prediction as a

classification problem as in case of DTs and SVMs

where the class variable can take two values

(defective or not defective). Or the problem can be

modelled to predict expected number of defects in a

software module/system using different code and

change metrics. ML techniques for pattern

recognition for e.g. ANNs and BNNs can be used to

accomplish such tasks.

Number of various classification models including

DTs and SVMs have been evaluated and compared in

(Lessmann et al., 2008). Iker Gondra (Gondra, 2008)

applied machine learning algorithms to predict the

fault proneness and compared between the ANNs

and SVMs and found that if fault proneness is

modelled as classification task, SVMs performs

better than the ANNs.

Table 1 provides an overview of some of the

important ML techniques that can be applied for SDP

and lists their main advantages and limitations. For

details on ML techniques applicable in software

engineering domain; readers are referred to work by

Zhang and Tsai (Zhang and Tsai, 2003).

2.3 Technology Adoption Framework

According to Attewell (Attewell, 1992) adoption of

complex technology is not an event, but resembles

knowledge acquisition over time, the perspective is

applicable where new innovation/technique is

(Attewell, 1992):

 Abstract and have demanding scientific

base,Fragile in sense of consistency, i.e. do

not always perform as expected,

 Difficult to try in a meaningful way, and

 Unpackaged, i.e. adopters cannot pick a tool

out of shelve and use it as a black box model,

but instead need to acquire broad tacit

knowledge and procedural know-how.

ICSOFT-EA2014-9thInternationalConferenceonSoftwareEngineeringandApplications

384

Table 1: Overview of ML techniques used for software

defect prediction.

Algorithm Type DTs

Domain

Knowledge

Not Required

Training Data

Adequate data needed to avoid

over-fitting.

Advantages

Robust to noisy data; Missing

values tolerated; Capable of

learning disjunctive expressions.

Disadvantages Prone to over-fitting.

Algorithm Type SVMs

Domain

knowledge

Not Required

Training Data Adequate data needed for training.

Advantages

Effective for high dimensional

spaces, is memory efficient and is

versatile as it can take different

kernel functions as decision

function

Disadvantages

SVMs are likely to give low

performance if number of features

is much higher than the number of

samples

Algorithm Type ANNs

Domain

knowledge

Not Required

Training Data Adequate data needed for training.

Advantages

Able to learn non-linear and

complex functions; Robust to

errors in training data.

Disadvantages

Slow training and convergent

process; Prone to over-fitting;

Results difficult to interpret.

Algorithm Type BNNs

Domain

Knowledge

Not Required

Training Data

Required for estimate the prior

probabilities.

Advantages

Able to give probabilistic

predictions; Useful for knowledge

discovery; Can be used very early

in the development lifecycle

Disadvantages

Requires estimation of many prior

probabilities that can be very large

for big models; computationally

expensive; requires domain

expertise for building the network.

Characteristics of ML based techniques fits well

to most above point and thus can be classed as

complex technology/techniques. Further according to

the Theory of Reasoned Action (TRA) (Ajzen and

Fishbein, 1980), the intention of adoption of

behaviour or technology is based on the beliefs about

the consequences of adoption. The theory have been

used to build Technology Acceptance Model (TAM)

by Davis (Davis Jr, 1986), an overview of model is

presented in Figure 1. TAM postulates that a users’

adoption intention and the actual usage of

information technology is determined by two critical

factors, the perceived usefulness and perceived ease

of use. Perceived usefulness is defined as the degree

to which a user believes that using a particular

system would enhance his/her job performance,

while perceived ease of use is the degree to which the

user believes that using the system would be effort

free (Van der Heijden, 2003).

Figure 1: Overview of Original Technology Acceptance

Model (Legris et al., 2003).

In this study we are focused on technology

adoption decisions, thus the model we use for our

framework is based on the revised version of original

TAM model (Pijpers et al., 2001), the postulation of

revised model is that potential users of a technology

actively evaluate the usefulness and ease of use of

given technology in their decision making process

(Yang, 2005). Our position in this paper is similar:

We contend that applying technology adoption

framework to ML techniques use in SDP is needed to

better understand the needs of industry - which will

help accelerate the technology transfer and adoption

process of these techniques.

Technology adoption framework by Tornatzky

et al. (Tornatzky et al., 1990) also provide a model of

adoption that has been applied widely. According to

the framework, there are three elements which

influence the innovation adoption process:

1. The external environmental context,

2. The technological context, and

3. The organizational context.

Chau and Tam (Chau and Tam, 1997) used the

framework to model the factors affecting adoption of

open systems in the Information Science (IS). We

adapt their framework in conjunction with the

Technology Acceptance Model (TAM) to model the

factors affecting adoption of ML in industry.

AFrameworkforAdoptionofMachineLearninginIndustryforSoftwareDefectPrediction

385

3 STUDY DESIGN

The research process for development and

quantitative validation of adoption framework for

ML techniques in industry is shown in Figure 2. The

focus of this paper is Stage-1, where the center of

attention has been to develop the general adoption

framework for machine learning techniques and

demonstrate how the model can be adapted for the

specific case of software defect prediction (SDP).

Figure 2: Research process overview.

Literature Review: To capture the factors that

affect the adoption of ML techniques in industry we

searched for likely factors mentioned in software

engineering, machine learning and technology

adoption literature. A list of factors deemed

potentially relevant for industry was compiled which

was used for discussions with the industrial

practitioners. The application area we concentrated

on is defect prediction in software system/projects.

Interviews: Semi-structured interviews were

conducted with industrial practitioners to first

evaluate which factors are relevant for ML adoption

in industry. In the next round the same interviewees

helped adapt this general model for the case of

software defect prediction.

In total four managers from two large companies

with significant focus on software development were

interviewed consequently in two rounds. The

companies included in the study are:

 Volvo Car Group (VCG): A company from the

automotive domain, and

 Ericsson: A company from the telecom domain

The divisions we interacted with have one thing in

common, they have not yet adopted machine learning

as their main method/technique for predicting

software defects, but they are evaluating it as a

possible technique to compliment the current

software defect measurement/prediction systems in

place. The interviewees included,

• Manager at Volvo Cars Group within the

department responsible for integrating software

sourced from different teams and suppliers, the

manager has more than 20 years of experience

working with software development and testing.

Ensuring safety and quality of software developed is

a major responsibility in this job role.

• Team leader at Volvo Car Group responsible for

collection, analysis and reporting of project status

with regard to software defects and their predictions,

the team leader has more than three decades of

experience in various roles at the company.

• A senior quality manager at Ericsson whose

experience with software (mainly within quality

assurance) spans more than three decades, and

• Team leader of metrics team at Ericsson;

metrics team is a unit at Ericsson that provides the

measurement systems for various purposes including

software defect measurement, monitoring and

prediction systems within the organization.

The main focus in the first round of interviews

is to identify the factors relevant with regard to

technology adoption/acceptance decisions (to build a

general framework of ML adoption in industry).

While the second round of interviews were focused

on identification of relevant attributes for each factor

in the specific context of software defect prediction.

4 FRAMEWORK FOR

ADOPTION OF ML

TECHNIQUES IN INDUSTRY

It is important to note that for any organization at any

given point in time, the trade-off analysis is not

between adopting or not adopting a new

technology/process (as in case of ML techniques);

the trade-off is between adopting it now or deferring

that decision until a later date. This distinction is

important as the factors that affect the adoption are

not only specifically related to direct advantages and

limitation of given technology/process, but also

organizational and environmental at a given point in

time. In this context, nine important factors that

affect the adoption of ML techniques were identified;

these can be grouped into three categories according

to the framework by Tornatzky (Tornatzky et al.,

1990). The framework for adoption of ML in

industry is presented in Fig 3.

In Fig 3 (+) and (-) signs denote the possibility

of positive/negative relationship with medium

strength between a given factor and probability of

ICSOFT-EA2014-9thInternationalConferenceonSoftwareEngineeringandApplications

386

adoption of ML. A double (++/--) indicate a strong

relationship; the strength of relationship can be tested

by setting a stricter significance level during

quantitative evaluation (for e.g. alpha value of 0.1 for

+/- and 0.05 for ++/--). Accordingly hypotheses for

each factor can be formulated which can be tested

quantitatively from a survey. We provide a couple of

examples of null hypothesis that can be quantitatively

tested:

H1: Higher levels of perceived benefits of

adopting ML techniques will strongly (and positively)

affect the likelihood of their adoption.

H2: Higher levels of perceived barriers of

adopting ML techniques will strongly (and

negatively) affect the likelihood of their adoption.

Figure 3: A Model for ML adoption in Industry.

5 ADAPTATION OF ML

ADOPTION FRAMEWORK

FOR SDP

We adapt the general framework for ML adoption in

industry (Fig 3) to the specific problem of software

defect prediction.

5.1 Characteristics of Machine

Learning

Adoption of any new technology or process change is

heavily dependent upon the characteristics of

technology/innovation. Factors affecting cost-benefit

trade-off of adoption are some of the critical factors

in decisions of adoption. The relevant attributes that

affect the acceptance of ML for software defect

predictions are presented in Figure 4.

Perceived Benefits: one of the most critical factors

in adopting ML techniques in industry are the

perceived benefits of these techniques for a given

organizations specific context. The keywords here

are perceived and context. While the actual benefits,

an organization can achieve by adopting a new

innovation/technology is important in long run, at a

given point in time what affects an organizations

decision to adopt a new specific

technology/innovation is its perception.

When it comes to SDP, the perceived benefits of

using ML approaches as expressed in previous

studies evaluating ML techniques for SDP and

opinions expressed by the interviewees of this study

are ability of ML based algorithms to:

 Provide higher prediction accuracy (high

probability of detection and low probability of

false alarm) (Gondra, 2008).

 Be highly automated, i.e. most aspects of system

including data collection to visualization of

results can be done using smart algorithms

mining and analyzing data autonomously from

the multiple local databases (Zhang and Zaki,

2006) with minimal human intervention.

 It is perceived that ML techniques can handle

large data; in fact ML methods are expected to

improve their performance as more data is made

available over time (Zhang and Tsai, 2003).

 Another important expectation with techniques

applied to predicting software defects is that

these techniques are capable of identifying new

patterns in data thus providing new insights from

the data itself. This offers possibility to use large

historical data to discover regularities and use

them to improve future decisions (Mitchell,

1999). New insights can be generated using

large data by employing specific ML techniques

such as causal modeling for example by using

Bayesian Networks to model causal networks

and deduct probabilistic relationships.

 Given the self-adaptive nature, using ML

techniques is also perceived to be low on

maintenance activities.

Perceived Barriers: On the other hand perceived

barriers negatively affect the adoption/acceptance of

ML techniques. For software defect predictions,

some of the common perceived barriers are:

AFrameworkforAdoptionofMachineLearninginIndustryforSoftwareDefectPrediction

387

 Steep learning curve – According to Edmondson

et al. (Edmondson et al., 2003), users of new

innovation/technology need to understand it well

before they can put it into productive use. Their

study also suggests that when tacit knowledge is

needed, new technologies may fail in market

even when their advantages have been proven.

For example in case of SDP, when using

classification or pattern recognition, selecting

the set of attributes (inputs) that give optimal

results is very much based on domain

experience and experience of using ML based

techniques which is difficult to

document/codify explicitly for new users.

 Lack of trust – stakeholders in software projects

who are used to traditional approaches of

predicting defects (such as expert opinions) do

not generally trust the algorithms to outperform

expert based predictions.

 For software projects, in general and in

particular for safety and business critical

software products, the penalty for mis-prediction

is an important barrier. The severity of mis-

prediction is correlated to importance of

information need and actions it can trigger. For

example a prediction model that falsely predicts

20% of software modules as defect prone

(compared to actual 10%) may lead to review of

10% modules which was unnecessary and results

in resource allocation which is not optimal.

As traditional methods have been used for

comparatively longer time, their levels of

(un)certainty are known – which is not the case

with ML techniques. To overcome this barrier

we recommend that in the initial phase of

adoption of machine learning techniques, these

should be using alongside the traditional

methods to validate their usefulness and

predictive accuracy in practice. This provides the

comparisons industrial practitioners want to see

before trust in new techniques begins to build up

over such trial periods.

 Given that most practical aspects can be affected

by wide range of factors; techniques based on

ML approaches usually do not take into account

all of these. Human factors such as differences in

productivity, people getting sick or motivation

level of employees are hard to measure and

account for in algorithmic models for SDP and

thus a source of error in such techniques.

 Uncertainty regarding generalizability of ML

over projects. The perception is that while ML

techniques (used for classification and pattern

recognition) work well in recognizing existing

patterns in the data, but their performance

degrades for patterns that are unseen before.

Figure 4: Overview of attributes relevant to ML

characteristics that affects its acceptance for SDP.

Availability of Tool and Support is expected to

increase the acceptance of ML in industry

(Sonnenburg et al., 2007). Some of the attributes

related to this factor are - if the available tools are

open source or proprietary, how much support is

available and how much they cost. Others include if

the given tool is compatible with existing

measurement systems and in-house competences

with respect to its usage. Consulting services can also

help specific companies to get started with new

approaches that they do not have enough experience

with - thus helping acceptance of new techniques and

tools in industry.

A number of packages implementing ML

algorithms are available for e.g. Netlab, Spider and

BNT for Matlab; Nodelib, Torch for C++; and

CREST for python. Commercial (e.g. Ayasdi,

NeuroSolutions etc.) and open source tools (e.g.

Weka, KNIME etc.) are also available with GUI.

While availability of such tools is likely to increase

ICSOFT-EA2014-9thInternationalConferenceonSoftwareEngineeringandApplications

388

the adoption of ML in industry, other attributes such

as support and consulting services is also important

in determining the level and speed with which ML is

adopted in the industry.

One possible way of enhancing adoption

through tool and support availability is by making

available problem specific customized solutions for

highly relevant industrial problems such as SDP.

Other activities that can potentially accelerate the

adoption process is integration of ML based

algorithms in existing software packages widely used

within industry, for e.g. Microsoft Neural Network

algorithm available for SQL Server 2012.

5.2 Organizational Characteristics

Need and Importance: The higher the need and

importance of given information is in an

organization, the higher is the likelihood for adopting

new techniques to satisfy this information need.

To improve on the accuracy and reliability for

such measures, new approaches that offer higher

accuracy and reliability are more likely to be

adopted. Zhang and Tsai (Zhang and Tsai, 2003)

provides a good overview of applications of ML in

software engineering domain which outlines different

information needs within this domain. Examples of

information need specific to software defect

predictions are:

 Predicting software quality (identification of

high-risk, or fault-prone components)

 Predicting software reliability

 Predicating expected number of defects

 Predicting maintenance task effort

 Predicting software release timings

Factors such as how satisfied a company is with

its existing defect prediction systems, their

familiarity with machine learning techniques and in-

house competences are also important for explaining

acceptance and adoption of ML for SDP within a

company. A model of attributes that contribute to

these factors is presented in Figure 5.

Satisfaction with Existing Systems: the motivation

for change (adoption of new approaches) is strongly

connected to given organizations satisfaction with its

current measurement/analysis systems. If a company

is well satisfied with accuracy and efficiency of

existing methods it is unlikely to invest significant

amount of cost, resources and learning on new

approaches. In case of software defect prediction,

attributes relevant to satisfaction with existing

systems are:

 If or not the existing system satisfies the

information need of stakeholders involved in the

project.

 Does existing system allow stakeholders to

effectively and efficiently visualize the trend

over time and let them compare current projects

with similar historical projects data.

Figure 5: Overview of attributes relevant to organizational

characteristics that affects its acceptance for SDP.

The reliability and cost also plays important

role in determining the level of satisfaction with

existing defect management and prediction systems

within software development organizations.

Familiarity and Competence with ML

Techniques: organizations familiar with approaches

of machine learning though their workforce or

collaborations with academia will have better

understanding of advantages and limitations of such

approaches. These organizations will also be more

informed about practical applicability of these

techniques and thus in a position where they can

identify and assess areas where the benefits of using

ML techniques outweigh the barriers – therefore

organizations that are familiar with such methods are

strongly likely to adopt these methods.

Attewell (Attewell, 1992) proposes that “firms

delay in-house adoption of complex technology until

they obtain sufficient technical know-how to

implement and operate is successfully”

AFrameworkforAdoptionofMachineLearninginIndustryforSoftwareDefectPrediction

389

Almost all mature organizations engaged in

developing software generally collect, store and

analyze their product and process related data. Given

that such data is available in large quantities (within

the organizations), an organization with good

competences/skills in machine learning are more

likely to try ML techniques on their data and

eventually adopt it on larger scales.

The main challenge in this context is

unavailability of structured data. Much of the data

generated within an organization is in form of

unstructured text (e.g. software requirements, defect

reports, customer feedback written in textual form).

On the other hand most ML algorithms require inputs

in numeric or categorical form which presents

challenge in using such data in practice.

Developments in field of Natural Language

Processing (NLP) are already addressing these

challenges and advances in such areas are likely to

increase the adoption of ML based techniques for

SDP.

5.3 External Environment

ML techniques, if adopted in different industries

signals their applicability in practice, although this is

not expected to be a strong factor deriving adoption

in other industries – it is likely to affect positively the

probability of adoption.

A similar but stronger factor for adoption of new

technology/approaches such as ML in a given

company is likely to be the information whether or

not any of the competing companies are using such

techniques. The motivation behind this factor is

simple - every organization in a given domain

intends to be at the forefront of technology or process

knowledge. The adoption of a particular

technique/process by a competitor is a strong signal

that given technique could have potential benefits;

this can potentially motivate the need for evaluation

of such methods within the given organization.

6 HOW TO USE THE

FRAMEWORK

Over the years companies have begun capturing huge

volumes of data about their products, consumers and

operations (Mitchell, 1999). ML offers new tools that

can use this data to recognize patterns and provide

useful insights hidden within these huge volumes of

data.

6.1 Setting the Research Direction

The research in software defect predictions has been

mainly focused on evaluating and highlighting the

predictive accuracy of ML techniques and in some

cases comparing it to traditional methods. On the

other hand the adoption framework indicates that not

only predictive accuracy, but attributes such as cost,

reliability and generalizability are also important for

adoption decisions.

Therefore the technology adoption framework,

such as one proposed here, can be useful to guide

future research directions by helping to identify

which factors are relevant for industrial adoption,

but currently unaddressed in terms of their scientific

evaluation.

6.2 Evaluating Specific ML Techniques

by a given Company

Technology acceptance/adoption frameworks

enhance our understanding of which factors affect

the end users decision to adopt a given

technology/innovation. Although these factors do

play a role to varying degree when companies

evaluate their decision to adopt or delay the adoption

of such techniques, the lack of a framework can lead

to sub-optimal decisions. Without a guiding

framework there is high probability that effect of

some detailed attributes that affect the overall

usefulness is missed. The severity of problem is

greater when comparisons are made between two or

more techniques or tools where it is likely that

evaluation would focus only on small set of

attributes which does not provide the full picture.

In such cases, the adoption framework can be

used as a guide so that all important factors and

associated attributes are covered when considering

adoption of new techniques or tools or even as a

checklist to make such assessment and comparison

between two or more techniques/tools using Likert-

type scale for evaluation. To provide an example,

Table 2 shows a checklist to compare a ML based

technique against existing system for SDP and Table

3 show potential use of similar checklist for

comparison of two competing tools. Industrial

practitioners can use such checklists to make

informed decision with regard to adoption of these

techniques and for effective comparison between

tools.

The technology adoption framework also help

companies to reflect upon their strengths with

respect to given technology and areas of potential

improvement. Such analysis is useful to identify

ICSOFT-EA2014-9thInternationalConferenceonSoftwareEngineeringandApplications

390

areas where training and competence build-up would

be advantageous. For example in SDP, if a company

identifies that the in-house competence for

implementing and maintaining ML based system

would benefit a specific business unit within the

organization, necessary training and or recruitment

targeting those specific skills could be quickly

arranged, thus improvising the long term

competitiveness of the company.

Table 2: Example of how comparative checklist can be

used to evaluate new technique for SDP.

Attribute Existing

Method

New ML based

technique

Predictive Accuracy Good Very Good

Auto data acquisition Yes Yes

Report generation Yes, word

document

Yes, web based

Can handle multiple

projects

No Yes

Generate causal maps No Yes

Running time (typical

project)

15min 30min

Cost of license (tool) None $ 20000/ license

Maintenance cost

(estimate)

$ 2000 pa $ 7000 pa

…

6.3 Improvising the Tool and Services

by Vendors

Technology adoption framework is also useful for

tool vendors who can use the information in multiple

ways, to:

 Prioritize feature introduction, and

 Effective marketing of their tools and

services

Tools based on emerging

technologies/techniques usually provide new

functionality not available in old well established

tools, but at the same time they are not mature and

need to constantly evolve to engage and acquire new

customers. Understanding clearly which attributes

are key for adoption decision help these tool vendors

to prioritize the features they implement and deliver

to their customers. For example, a vendor with Tool

X for SDP which at a given time do not outperform

existing tools on predictive accuracy; finds out that

running and maintenance costs are important

attributes in adoption decisions - may use this

information to strategically decide to develop a light

version of tool which demands low running and

maintenance costs.

Table 3: Example of how adoption framework can be used

to compare between two new tools/services.

Attribute Tool A Tool B

Predictive Accuracy 85% 82%

Auto data acquisition Yes Yes

Report generation Yes, web

based

Yes, multiple

format

Can handle multiple

projects

Yes Yes

Generate causal maps Yes, Non-

Interactive

Yes, Interactive

Running time (typical

project)

30min 40min

Cost of license (tool) $ 20000/

license

$ 35000/ license

Maintenance cost

(estimate)

$ 7000 pa $ 9000 pa

…

Understanding of which attributes play a key

role in adoption decisions also help tool and service

vendors to make their marketing more effective.

Vendors may choose to highlight how they provide

value to their customers on the key attributes

industry is looking for when considering adopting a

new technology based product or services. This

accelerates the adoption and acceptance of new

techniques within the industry.

7 CONCLUSIONS AND FUTURE

WORK

Large and constantly growing amount of data is now

available within organizations that can be used for

gaining useful insights to improvise process, products

and services. Machine learning techniques have high

potential to aid companies in this purpose. Despite

demonstration of usefulness of such techniques in

academia and availability of tools, the adoption of

these techniques in industry currently is far from

optimal. Our position in this paper has been that for

accelerating the adoption of ML based techniques in

industry, we need to enhance our understanding of

information needs of industry in this respect.

Technology acceptance model offer cost effective

approach to meet this purpose.

In this paper we developed a framework for the

adoption of ML techniques in industry. The

framework is developed with its basis on previous

research on technology adoption and technology

acceptance models. We also adapted the framework

to the specific problem of software defect predictions

and highlighted that while adoption decisions are

AFrameworkforAdoptionofMachineLearninginIndustryforSoftwareDefectPrediction

391

multi-dimensional, current research studies have

mainly focused on few of these attributes. We

contend that elevating our understanding of factors

and attributes relevant for industrial practitioners will

help companies, researchers and tool vendors to meet

the specific information needs.

In future work we plan to quantitatively

evaluate the effect size of important attributes

towards ML adoption decision using large scale

survey of companies that have already adopted ML

techniques and ones that are yet to embrace them.

Research with regard to which factors are important

for industry and evaluative studies of ML based

techniques/tools on these factors can complement the

existing and on-going work on establishing the

characteristics of ML techniques and thus contribute

toward their adoption in industry and society.

ACKNOWLEDGEMENTS

The research presented here is done under the VISEE

project which is funded by Vinnova and Volvo Cars

jointly under the FFI programme (VISEE, Project

No: DIARIENR: 2011-04438).

REFERENCES

Ajzen, I. & Fishbein, M., 1980. Understanding attitudes

and predicting social behaviour. Available at:

http://www.citeulike.org/group/38/article/235626.

Attewell, P., 1992. Technology diffusion and

organizational learning: The case of business

computing. Organization Science, 3(1), pp.1–19.

Bertolino, A., 2007. Software testing research:

Achievements, challenges, dreams. In 2007 Future of

Software Engineering. IEEE Computer Society, pp.

85–103.

Boehm, B., 1987. Industrial software metrics top 10 list,

IEEE Computer Soc 10662 Los Vaqueros Circle, Po

Box 3014, Los Alamitos, Ca 90720-1264.

Chau, P.Y. & Tam, K.Y., 1997. Factors Affecting the

Adoption of Open Systems: An Exploratory Study.

Mis Quarterly, 21(1).

Davis Jr, F.D., 1986. A technology acceptance model for

empirically testing new end-user information systems:

Theory and results. Massachusetts Institute of

Technology.

Edmondson, A.C. et al., 2003. Learning how and learning

what: Effects of tacit and codified knowledge on

performance improvement following technology

adoption. Decision Sciences, 34(2), pp.197–224.

Gondra, I., 2008. Applying machine learning to software

fault-proneness prediction. Journal of Systems and

Software, 81(2), pp.186–195.

Güneş Koru, A. & Tian, J., 2003. An empirical

comparison and characterization of high defect and

high complexity modules. Journal of Systems and

Software, 67(3), pp.153–163.

Harrold, M.J., 2000. Testing: a roadmap. In Proceedings

of the Conference on the Future of Software

Engineering. ACM, pp. 61–72.

Van der Heijden, H., 2003. Factors influencing the usage

of websites: the case of a generic portal in The

Netherlands. Information & Management, 40(6),

pp.541–549.

Khoshgoftaar, T.M. et al., 1992. Predictive modeling

techniques of software quality from software

measures. Software Engineering, IEEE Transactions

on, 18(11), pp.979–987.

Khoshgoftaar, T.M. & Allen, E.B., 1999. Logistic

regression modeling of software quality. International

Journal of Reliability, Quality and Safety Engineering,

6(04), pp.303–317.

Khoshgoftaar, T.M., Munson, J.C. & Lanning, D.L., 1993.

A comparative study of predictive models for program

changes during system testing and maintenance. In

Software Maintenance, 1993. CSM-93, Proceedings.,

Conference on. IEEE, pp. 72–79.

Legris, P., Ingham, J. & Collerette, P., 2003. Why do

people use information technology? A critical review

of the technology acceptance model. Information &

management, 40(3), pp.191–204.

Lessmann, S. et al., 2008. Benchmarking Classification

Models for Software Defect Prediction: A Proposed

Framework and Novel Findings. IEEE Transactions

on Software Engineering, 34(4), pp.485–496.

Menzies, T. et al., 2003. How simple is software defect

detection. Submitted to the Emprical Software

Engineering Journal.

Mitchell, T.M., 1999. Machine learning and data mining.

Communications of the ACM, 42(11), pp.30–36.

Pijpers, G.G. et al., 2001. Senior executives’ use of

information technology. Information and Software

Technology, 43(15), pp.959–971.

Sonnenburg, S. et al., 2007. The need for open source

software in machine learning. Journal of Machine

Learning Research, 8(10).

Tornatzky, L.G., Fleischer, M. & Chakrabarti, A.K., 1990.

Processes of technological innovation.

Yang, K.C., 2005. Exploring factors affecting the adoption

of mobile commerce in Singapore. Telematics and

informatics, 22(3), pp.257–277.

Zhang, D. & Tsai, J.J., 2003. Machine learning and

software engineering. Software Quality Journal, 11(2),

pp.87–119.

Zhang, S. & Zaki, M.J., 2006. Mining multiple data

sources: local pattern analysis. Data Mining and

Knowledge Discovery, 12(2-3), pp.121–125.

Zimmermann, T., Premraj, R. & Zeller, A., 2007.

Predicting defects for eclipse. In Predictor Models in

Software Engineering, 2007. PROMISE’07: ICSE

Workshops 2007. International Workshop on. IEEE,

pp. 9–9.

ICSOFT-EA2014-9thInternationalConferenceonSoftwareEngineeringandApplications

392