Automated Measurement of Technical Debt: A Systematic Literature

Review

Ilya Khomyakov, Zufar Makhmutov, Ruzilya Mirgalimova and Alberto Sillitti

Innopolis University, Russian Federation

Keywords:

Technical Debt, Measurement, Literature Review.

Abstract:

Background: Technical Debt (TD) is a quite complex concept that includes several aspect of software devel-

opment. Often, people talk about TD as the amount of postponed work but this is just a basic approximation

of the concept that includes many aspects that are technical and managerial. If TD is managed properly, it

can provide a huge advantage but it can also make projects unmaintainable, if not. Therefore, being able of

measuring TD is a very important aspect for a proper management of the development process. However, due

to the complexity of the concept and the different aspects that are involved, such measurement it not easy and

there are several different approaches in literature.

Goals: This work aims at investigating the existing approaches to the measurement and the analysis of TD

focusing on quantitative methods that could also be automated.

Method: The Systematic Literature Review (SLR) approach was applied to 331 studies obtained from the three

largest digital libraries and databases.

Results: After applying all ﬁltering stages, 21 papers out of 331 were selected and deeply analyzed. The

majority of them suggested new approaches to measure TD using different criteria not built on top of existing

ones.

Conclusions: Existing studies related to the measurement of TD were observed and analyzed. The ﬁndings

have shown that the ﬁeld is not mature and there are several models that have almost no independent validation.

Moreover few tools for helping to automate the evaluation process exist.

1 INTRODUCTION

A keen competition among companies for customer

satisfaction is one of the reasons behind the contin-

uous pressure to produce high-quality and maintain-

able source code in continuously reduced timeframes

(Kan, 2002). Several studies has been performed

focusing on the activities of the developers (Coman

and Sillitti, 2007) (Coman and Sillitti, 2008) (Moser

et al., 2008) (Coman et al., 2014) and considering

code quality as the main criterion for releasing a prod-

uct could lead to consume an excessive amount of re-

sources (Corral et al., 2014). However, this criterion

is critical to achieve a high level of customer satisfac-

tion and the quality of the product is often a prereq-

uisite to achieve success. Consequently, companies

focusing on the quality of their product usually have

a better market success (Boehm et al., 2001).

There are situations in which it is required to re-

duce the development time to achieve a minimum

working product. This is a typical situation of startup

companies that have very aggressive schedules to de-

liver the product that allows them to survive. In such

context, the sub-optimal decisions that decrease the

quality of the system lead to a strategic TD (Tom

et al., 2013). This allows the company to deliver

the product for which the customer pays allowing the

company to survive. In any case, companies should

be aware that in the long run such sub-optimal deci-

sions require additional effort in the future to ﬁx the

product (Coman et al., 2008) (Corral et al., 2013).

This phenomenon was originally described by

Ward Cunningham in 1992 (Cunningham, 1992) in-

troducing the concept of TD. There are many more

sources of TD that have been investigated recently

that involve communication, collaboration among

team members, documentation, and individual atti-

tudes (Tom et al., 2013) (Lenarduzzi et al., 2017).

Since TD is a way of measuring the effort needed

to achieve top quality in a software system compared

to the current status, it is of paramount importance

being able of measuring (or estimating) it. The im-

portance of such an activity is proved by the simple

fact that most of the software projects have some TD

Khomyakov, I., Makhmutov, Z., Mirgalimova, R. and Sillitti, A.

Automated Measurement of Technical Debt: A Systematic Literature Review.

DOI: 10.5220/0007675900950106

In Proceedings of the 21st Inter national Conference on Enterprise Information Systems (ICEIS 2019), pages 95-106

ISBN: 978-989-758-372-8

(Falessi et al., 2013). Being able to estimate TD al-

lows development teams and the manager to plan the

work properly.

It may also happen that TD is too high to be payed

(Chatzigeorgiou et al., 2015), requiring different ap-

proaches to address it (e.g., rewriting the system).

However, knowing that and how the system reach that

condition could help in the identiﬁcation of past mis-

takes and improve the development.

Moreover, the measurement of TD should be per-

formed automatically to avoid increasing the load of

the developers and being able to monitor that continu-

ously. This is particularly useful in conjunction to the

usage of Agile approaches that can use such informa-

tion to adapt iterations continuously.

For all these reasons, being able of measuring au-

tomatically TD is of paramount importance to support

the daily work of developers. There are many differ-

ent approaches to TD in literature and this paper pro-

vides an extensive analysis pointing out the current

status of the research.

The paper is organized as follows: Section 2 de-

scribes the adopted methodology; Section 3 discusses

the ﬁndings; Section 4 investigates the related work;

Section 5 analyzes the threats to validity; ﬁnally, Sec-

tion 6 draws the conclusions and introduces future

work.

2 METHODOLOGY

The protocol adopted for this Systematic Literature

Review (SLR) is the one introduced by Kitchenham

and Charters (Kitchenham and Charters, 2007) for

performing such reviews in the software engineering

area.

The main goal of this work is to review existing

studies and highlight the aspects related to TD mea-

surement, therefore we have deﬁned the following re-

search questions:

• RQ1: Which are the existing techniques for mea-

suring TD?

• RQ2: Which are the tools that support the automa-

tion of the measurement of TD?

• RQ3: Are there any empirical studies able to

demonstrate the usefulness of the identiﬁed tech-

niques?

• RQ4: Are there any empirical studies able to

demonstrate the usefulness of the tools identiﬁed?

To answer the research questions, we have

searched for papers using the three largest digital li-

braries: ACM Digital Library, IEEE Xplore, Google

Scholar.

Since only studies focusing on TD as main topic

are interesting for our purpose, we suppose that their

title or abstract include the key word technical debt.

Consequently, we used appropriate queries for each

library. The data have been extracted in August 2018,

when the study was started.

Only certain papers should be included to the ﬁ-

nal result: containing abstracts, considering TD as a

main topic, written in English. No year constraint was

speciﬁed, since we aimed at collecting all appropriate

data despite of the date.

Many publications found in the digital libraries

were not appropriate for our study since we were

interested in primary studies published in referred

workshops, conferences, and journals. Therefore, we

excluded documents such as: summaries of work-

shops, tutorials, introductory descriptions of confer-

ences, research plans, presentations, not primary stud-

ies. Therefore, we excluded all the documents that

were not proper research papers.

Finally, we manually excluded all the papers not

related to our research that passed the previous ﬁlters

but still included in the list. The selection was per-

formed after reading the entire content of the papers.

3 RESULTS

We found 603 papers distributed as follows: ACM

Digital Library (111), IEEE Xplore (194), Google

Scholar (298).

As expected, there was a signiﬁcant overlap in the

papers found in the different libraries. Therefore, the

ﬁrst step was merging the results and removing dupli-

cates. Finally, at the end of the process, we selected

21 papers. The overall selection process is summa-

rized in Figure 1 (the numbers on the arrows show the

amount of papers that passed each phase):

• Step 1: Merging all Papers from Data Sources.

The initial list included 603 papers but many du-

plicates were present. The identiﬁcation of the

duplicates was performed manually to avoid prob-

lems with minor character differences in the titles

and in the author names. At the end, we had a list

of 331 unique papers.

• Step 2: Applying Exclusion Criteria. At this

stage, we applied the exclusion criteria resulting

in a selection of 274 papers. At this stage we still

kept in the list the secondary studies.

• Step 3: Excluding not Primary Studies. At this

stage, we identiﬁed the secondary studies (e.g.,

systematic reviews, systematic mappings, etc.)

that were removed from the list and analyzed in

ICEIS 2019 - 21st International Conference on Enterprise Information Systems

Figure 1: Steps of the selection process.

2011 2012 2013 2014 2015 2016 2017 2018

Figure 2: Distribution of papers related to TD measurement

over the years.

Section 4. The secondary studies identiﬁed are 10

and the list is reduced to 264 papers.

• Step 4: Considering Studies Related to Mea-

surement of TD. Reading the title and the ab-

stract of the 264 papers, we identiﬁed the studies

related to the measurement of TD. We identiﬁed

38 papers distributed between 2011 and 2018 as

described in Figure 2.

• Step 5: Quality Assessment. We read the 38 pa-

pers identiﬁed and we excluded 17 of them since

they were not dealing with the measurement of the

technical debt even if from the title or the abstract

they appeared appropriate for our investigation.

3.1 RQ1: Which are the Existing

Techniques for Measuring TD

The identiﬁed studies have been analyzed in terms of

proposed techniques, their requirements about input

data needed for the calculation of TD, the resulting

information, advantages and disadvantages of the ap-

proach. Table 5 summarises the techniques identiﬁed

while Table 1 compares the input required by the dif-

ferent techniques and Table 2 the output generated.

Letouzey (Letouzey, 2012) proposed a method for

TD evaluation named Software Quality Assessment

Based on Lifecycle Expectations (SQALE), which is

described as an answer to the need for an objective

and standardized open-source method with low false

positives. At the ofﬁcial website of the method

there is a list of several tools able to analyze the code

written in different languages.

The method deﬁnes how to formulate and orga-

nize non-functional requirements that can affect code

quality deﬁning a herarchical structure of character-

istics and sub-characteristics similar to the ISO qual-

ity model. SQALE has been developed to be auto-

mated and considers several properties of the code

but two main aspects are not taken into account. The

ﬁrst one is that non-conformities for business or op-

erations are not considered important by any index

of SQALE (considering version 1.0 (Letouzey and

Ilkiewicz, 2012)). The second one is that there is

no deﬁnition of the level of implementation of the re-

quirements.

CAST (Curtis et al., 2012) presents a formula

with ﬂexible parameters to measure TD. That ﬂexi-

bility implies the possibility of adjusting the param-

eters to the speciﬁcity of a particular organization.

The approach deﬁnes ﬁve Health Factors that have

a different impact on the overall TD: Changeability

(30%), Transferability (40%), Robustness (18%), Se-

curity (7%), Performance Efﬁciency (5%).

Violations in each area are rated according to their

severity and a formula is applied for calculating the

ﬁnal value of the debt. The approach has been eval-

uated on 745 business applications containing more

than 10 KLOC using the CAST proprietary Applica-

tion Intelligence Platform.

The SIG/TUViT approach (Nugroho et al., 2011)

is based on a sound and quantitative approach for

measuring software quality from source code. More-

over, the estimation of TD is based on empirical data

using a model that is quite simple.

Mayr at al. (Mayr et al., 2014) deﬁne a model that

provides a combination of the beneﬁts of the ﬂexible

approaches to quality changes and the simplicity of

the SIG model. The approach requires only informa-

tion from static code analysis. The output is simple

as well, being the hours of work required to pay the

debt.

Skourletopoulos et.al. (2015) (Skourletopoulos

et al., 2015) developed a ﬂuctuation-based modelling

approach to TD. It measures the amount of proﬁt not

earned due to the under-usage of a given service and

http://www.sqale.org/

Automated Measurement of Technical Debt: A Systematic Literature Review

Table 1: Input of TD measurement techniques.

Technique (method)

Target

quality

level

Debt-

estimating

model

Number of

should-ﬁx

violations

The hours

to ﬁx each

violation

The cost

of labor

Source

code

Output data

from static

code analyzers

Candidate

cloud-based

mobile service

Past changes

in the history

of the system

Developer

activity data

SQALE X X - - - - - - - -

CAST - - X X X - - - - -

SIG X - - - X X - - - -

A benchmarking-based model X - - - X X X - - -

A ﬂuctuation-based

modelling approach

- - - - - - - X - -

Breaking Point for TD - - X X X - - - X -

LOC and Fan-In to Quantify

the Interest of SATD

- - - - - X - - - -

A framework for design level TD - - - - - X - - - -

A framework for estimating

interest on TD

- - - - - - - - - X

Modularity metrics for ATD - - - - - X - - X -

Detecting and quantifying SATD - - - - - X - - - -

considering the probability of over-usage of the se-

lected service that would lead to accumulated TD.

The hypothesis is that service capacity affects to ser-

vice choice, which is made with respect to the pre-

dicted ﬂuctuations in the number of users over some

time and the way TD is gradually paid off. Con-

sequently, formulas for predicting appearance of TD

were developed, as well as tools for validating them.

Chatzigeorgiou et.al. (2015) (Chatzigeorgiou

et al., 2015) provide an estimation of a breaking point,

that is when debt becomes too large to be paid off.

The source code is initially assessed by ﬁtness func-

tion based on the Entity Placement metric quantifying

coupling and cohesion. The approach is based on the

identiﬁcation of the best design for a system. The cost

of reaching that best system with necessary refactor-

ings is calculated as well as number of versions lead-

ing to the breaking point. However, the authors point

out some issues to be considered:

• only coupling and cohesion dimensions exist for

the method, but TD has many other aspects

• maintenance effort means not just adding lines of

code, but deleting and modifying them

• future maintenance effort cannot be predicted

solely on the basis of past maintenance tasks

Kamei et al. (Kamei et al., 2016) propose measur-

ing the self-admitted TD interest with code metrics

like LOC (because it well correlates with code com-

plexity metrics) and Fan-In (showing how much one

piece of code affects another one). They have vali-

dated the approach on the Apache JMeter project.

Marinescu (Marinescu, 2012) proposes a frame-

work exploring TD symptoms at design level. The

construction of such framework includes four steps:

1. deﬁnition of the principles for ﬁnding design de-

fects

2. identiﬁcation of a set of relevant design defects

3. estimation of the impact of each defect

4. the overall design quality is calculated

The framework also includes:

• a coarse-grain approach to monitor the evolution

of TD over time

• a more detailed approach that enables locating and

understanding individual ﬂaws, which can lead to

a systematic refactoring

The approach has been applied in a case study

including 63 releases of two well known Eclipse

projects (JDT and EMF). However, the conclusions of

the case study cannot be generalized, considering the

restricted number of systems analyzed and the limited

number of design ﬂaws that were included in the ac-

tual instantiation of the framework.

In the framework proposed by Singh et al. (Singh

et al., 2014), TD estimation is based on measures of

code maintainability obtained via static analysis and

interest estimation based on activity data obtained by

monitoring developer actions in the IDE. Main contri-

bution of the framework is the integration of a devel-

oper activity data with code metrics and to improve

the understanding of developer comprehension effort

resulting in an improved accuracy of the estimation.

Although the Architectural Technical Debt (ATD)

is difﬁcult to measure, the Average Number of Mod-

iﬁed Components per Commit (ANMCC) is a met-

ric proposed in (Li et al., 2014). However, commit

records may not exist anymore, therefore the authors

suggest to use Index of Package Changing Impact

(IPCI) and Index of Package Goal Focus (IPGF) in-

stead of ANMCC. The advantage of using such two

new metrics is the possibility of obtaining them di-

rectly from the source code. Then validation of cor-

relation of that metrics with ANMCC is performed.

However, the weakness of whole study is relying only

on results of projects developed in C#.

Maldonado et al. (Maldonado and Shihab, 2015)

examine code comments to identify and evaluate Self-

admitted Architectural Debt (SATD). The strength of

the approach is the usage of heuristics to eliminate

comments which are not likely to affect TD. In addi-

ICEIS 2019 - 21st International Conference on Enterprise Information Systems

Table 2: Output of TD measurement techniques.

Technique (method)

Design

symptoms

of TD

Remediation

Cost

Non-remediation

Cost

Relative

amount of TD

Breaking

Point

Number of

comments

SQALE X - - - - -

CAST - X - - - -

SIG - X X - - -

A benchmarking-based model - X - - - -

A ﬂuctuation-based

modelling approach

- - - X - -

Breaking Point for TD - X X - X -

LOC and Fan-In to Quantify

the Interest of SATD

- - X - - -

A framework for design level TD X - - - - -

A framework for estimating

interest on TD

- - X - - -

Modularity metrics for ATD - - - X - -

Detecting and quantifying SATD - - - - - X

Table 3: Tools able to support the automation of the measurement of TD.

Technique (method) Ref Tool Tool URL

Open source

SQALE (Letouzey, 2012) SonarQube https://www.sonarqube.org/ yes

MIND https://sourceforge.net/projects/mindyourdebt/ yes

FindBugs http://ﬁndbugs.sourceforge.net/ yes

Breaking Point for TD (Chatzigeorgiou et al., 2015) JCaliper http://se.uom.gr/index.php/projects/jcaliper/ yes

A framework for design level TD (Marinescu, 2012) inFusion https://chocolatey.org/packages/infusion/ no

A framework for estimating interest on TD (Singh et al., 2014) Blaze monitoring tool https://sites.google.com/site/blazedemosite/home/about no

Modularity metrics for ATD (Li et al., 2014) TortoiseSVN https://tortoisesvn.net/ yes

LOC and Fan-In to Quantify the Interest of SATD (Kamei et al., 2016) Understand https://scitools.com/ no

JDeodorant https://github.com/tsantalis/JDeodorant yes

Detecting and quantifying SATD (Maldonado and Shihab, 2015)

SLOCCount https://www.dwheeler.com/sloccount/sloccount.html yes

tion, the method classify comments to different types

of SATD.

3.2 RQ2: Which are the Tools that

Support the Automation of the

Measurement of TD?

TD measurement techniques often require a large

number of input data that require a large amount of ef-

fort to be extracted. Therefore, tools are of paramount

importance to support development teams in the inte-

gration of TD measurement in their daily work. Ta-

ble 3 provides a summary of the available tools and

the methodology they implement.

SonarQube (Gaudin, 2009) implements the

SQALE method of TD evaluation. It is used for con-

tinuous inspection of code quality to perform auto-

matic reviews with static analysis of code to detect

bugs, code smells and security vulnerabilities in sev-

eral programming languages.

MIND (ManagIng techNical Debt) is an open

source tool which is, to the best of our knowledge,

the ﬁrst tool supporting the quantiﬁcation and visu-

alization of the interest (Falessi and Reichel, 2015).

Basically, it is a plug-in for SonarQube. MIND uses

a few metrics to count the interest:

• Defect Proneness

• Maximum Defects per 100 LOC Touched

• Extra Defect Proneness

• Maximum Extra Defects per 100 LOC Touched

• Relative Extra Defect Proneness

• Average Relative Extra Defect Proneness

• Violation Density

• Linkage

• Estimation Error

JCaliper (Chatzigeorgiou et al., 2015) was de-

signed to ﬁnd the placement of entities that minimizes

the Entity Placement metric as a search-space explo-

ration problem. It automatically extracts the number,

type and sequence of refactoring activities required to

obtain the design without TD.

Blaze is a monitoring tool (Snipes et al., 2014)

recording temporal sequence of developer actions, in-

cluding code navigation actions and edit actions. The

log produced is subsequently analysed to ﬁgure out

class relationships and effort spent by a developer to

understand program elements.

TortoiseSVN allows extracting commit records

from standard SVN servers and any code repositories

supporting Subversion, such as GitHub. That records

Automated Measurement of Technical Debt: A Systematic Literature Review

are used by Li et al. (Li et al., 2014) to perform AN-

MCC metric checking.

JDeodorant (Tsantalis et al., 2008) is used in

(Kamei et al., 2016) for performing source code pars-

ing. In particular, the ability to extract a comment

and map it to its corresponding method is interesting.

Later in the paper, to calculate the interest that is in-

curred over time, 16 code metrics were extracted us-

ing the Understand tool (und, ). JDeodorant (Tsantalis

et al., 2008) is also used in (Maldonado and Shihab,

2015) to parse the source code and extract the code

comments. However, before that, the SLOCCount

tool (Wheeler, 2001) is applied to calculate SLOC in

Java ﬁles.

3.3 RQ3: Are there any Empirical

Studies able to Demonstrate the

Usefulness of the Identiﬁed

Techniques?

The empirical studies performed to validate the iden-

tiﬁed techniques are summarized in Table 4.

(Grifﬁth et al., 2014) assessed three methods

((Letouzey, 2012) (Curtis et al., 2012) (Marinescu,

2012)) to ﬁnd out if they effectively describe the re-

lationship between the quality of the system and the

level of TD.

Izurieta et al. (Izurieta et al., 2013) uses Nugroho

et al. (Nugroho et al., 2011) to exemplify the method-

ology.

A Benchmarking-based Model of Mayr et al.

(Mayr et al., 2014) is closely related to their earlier

work on benchmarking-oriented quality assessments.

Also it calculates the remediation cost in a way simi-

lar to the approach of CAST (Curtis et al., 2012).

Relevant code structure metrics in the framework

for estimating interest on TD (Singh et al., 2014) were

selected in such a way that related to maintainability

and TD in (Nugroho et al., 2011). Similar to the prior

work, static code metrics are used.

3.4 RQ4: Are there any Empirical

Studies able to Demonstrate the

Usefulness of the Tools Identiﬁed?

In (Parodi et al., 2016), TD was measured using two

static code analysis tools (Findbugs (Ayewah et al.,

2008) and SonarQube (Gaudin, 2009)). The goal was

evaluating if the code produced with the Test Driven

Development approach has a lower TD than code pro-

duced using other techniques. This two tools are

widely used in the community for measuring TD.

Other studies tested SonarQube: (Luhr et al.,

2015) use it for measuring TD in a particle tracker

system; (Monteith and McGregor, 2013) use it for

several calculation of TD in the software supply

chain; (Britsman and Tanriverdi, 2015) describes a

case study in Ericsson, where they had to observe TD

measurement tools to use them for evaluation system

creation based on ISO standard 15939:2007.

4 RELATED WORK

Investigating the different approaches for measuring

TD could be valuable to practitioners and researchers

to provide a better understanding of the ﬁeld and iden-

tify research gaps. However, we were not able to iden-

tify any secondary study related to the research ques-

tions we listed in Section 2. Instead, several others

deal with TD in general.

The systematic mapping study of Li et al. (Li

et al., 2015) was initiated to ﬁnd and analyze publica-

tions between 1992 and 2013 of TD and its manage-

ment. After the selection of 92 studies authors clas-

siﬁed 10 TD deﬁnition, identiﬁed 8 TD management

activities, and collected 29 tools for the latter.

Another systematic mapping study of TD deﬁni-

tions, Poliakov (Poliakov et al., 2015) has performed

full review of 159 papers. 107 deﬁnitions were sepa-

rated into keywords. Consequently, the main achieve-

ment of the research is built keyword map, supple-

mented by synonyms and types of TD.

Another literature review has been done by Alves

et al. (Alves et al., 2016) based on three research

questions. They evaluated 100 studies of 2010 - 2014

and proposed initial taxonomy of TD types, list of in-

dicators for identifying TD, and existing management

strategies.

There is a study considering another aspect of the

phenomenon. Ribeiro et al. (Ribeiro et al., 2016)

state that the evaluation of appropriate time to pay

TD and applying an effective decision-making criteria

are an important management goals. Consequently,

authors identiﬁed 14 such criteria for development

teams. Also the results showed gaps where further

research can be performed.

Recently, Behutiye et al. (Behutiye et al., 2017)

considered a narrow ﬁeld of study related to TD,

which means that they synthesized the state of the art

of TD and its causes, consequences, and management

strategies only in the context of agile software devel-

opment (ASD). In particular, after processing system-

atic literature review 38 primary studies, out of 346

studies, were identiﬁed and analyzed. Then ﬁve re-

search areas of interest related to the literature of TD

ICEIS 2019 - 21st International Conference on Enterprise Information Systems

100

Table 4: Identiﬁed techniques and the related empirical studies.

Technique (method) Based on Ref

Empirical

study

SQALE previous version (Letouzey, 2012) (Letouzey and Ilkiewicz, 2012) (Grifﬁth et al., 2014)

CAST - (Curtis et al., 2012) (Grifﬁth et al., 2014)

SIG SIG quality model (Heitlager et al., 2007) (Nugroho et al., 2011) (Izurieta et al., 2013)

A Benchmarking-based Model benchmarking-oriented method (Gruber et al., 2010), CAST (Curtis et al., 2012) (Mayr et al., 2014) (Mayr et al., 2014)

A Fluctuation-Based Modelling Approach - (Skourletopoulos et al., 2015) (Skourletopoulos et al., 2015)

Breaking Point for TD CAST (Curtis et al., 2012), previous version (Ampatzoglou et al., 2015a) (Chatzigeorgiou et al., 2015) (Chatzigeorgiou et al., 2015)

LOC and Fan-In to Quantify the Interest of SATD - (Kamei et al., 2016) -

A framework for design level TD - (Marinescu, 2012) (Marinescu, 2012), (Grifﬁth et al., 2014)

A framework for estimating interest on TD SIG (Nugroho et al., 2011) (Singh et al., 2014) (Singh et al., 2016)

Modularity metrics for ATD - (Li et al., 2014) (Li et al., 2014)

Detecting and quantifying SATD previous version(Potdar and Shihab, 2014) (Maldonado and Shihab, 2015) -

in ASD, as well as 12 strategies for managing it have

been found. Authors identiﬁed eight categories re-

garding the causes and ﬁve categories regarding the

consequences of incurring TD in ASD.

In the case of work performed by Besker et al.

(Besker et al., 2016) ATD is considered as affecting

to system success and able to cause expensive reper-

cussions, so the goal is to create new knowledge with

interest in ATD. Research efforts should be synthe-

sized and compiled for that. The main contributing

outcome of the paper is a presentation of a novel de-

scriptive model, providing comprehensive interpreta-

tion of ATD phenomenon.

Finally, the last related work focuses on a speciﬁc

view of TD. Employing a method for syntactic liter-

ature review and applying it to seven digital library

studies sources Ampatzoglou et al. (2016) (Ampat-

zoglou et al., 2015b) analyzed ﬁnancial aspect of TD.

Authors conclude that the communication between

technical managers and project managers is beneﬁ-

cial, because a vocabulary will be provided, and high-

quality goals will be set up. In order to achieve this,

they introduced a glossary of terms and a classiﬁca-

tion scheme for ﬁnancial approaches.

5 THREATS TO VALIDITY

The main threats to validity identiﬁed are the follow-

ing:

• Although the applied guideline (Kitchenham and

Charters, 2007) recommends to consider about

seven digital libraries for performing an exhaus-

tive search, in our case only three have been cho-

sen. The reason of it is that other sources contain

very few unique papers compared to the ACM and

IEEE digital libraries. Moreover, to avoid missing

important papers we used Google Scholar that in-

dex almost everything.

• Constructing appropriate search string is a tricky

task, since the title of some studies we are inter-

ested in does not include our key words, we de-

cided to extend the search to the abstracts. Since

we are interested in studies focusing on TD, we

suppose that the key word is mentioned in the ab-

stract.

• A way of automatically merging the outcome lists

from that libraries is risky, since even a single dif-

ferent symbol in title might affect the result. For

that reason, duplicates were identiﬁed and elim-

inated manually during the creation of a merged

list.

• It may happen that some information has not been

considered in our study since some papers could

have been accidentally skipped or not present at

the time of the query (August 2018).

6 CONCLUSIONS AND FUTURE

WORK

This study provides an overview of the available ap-

proaches to the measurement of TD and the tools able

to support its automation. The research in the ﬁeld is

very active but there is a lack of validation and evolu-

tion of the models. In particular, in almost all cases,

models are developed from scratch and not reﬁning or

extending existing ones. This shows a very low level

of maturity of the ﬁeld in which it is not clear which

is the best approach(s) to follow. Moreover, there is

a need of independent validation since nearly none of

the models have been independently evaluate but the

evaluation is usually performed by the proponents of

the approach.

About the tools, they usually require a complex

setup, they support a limited number of programming

languages, and the results provided are quite differ-

ent. They also use very different measurement units.

Finally, most of the tools are able to estimate only

the main TD (sometimes called remediation cost),

whereas also knowledge of its interest (sometimes

called non-remediation cost) would complete the pic-

ture.

Overall, both methodologies and support tools re-

quire a relevant amount of research to become really

usable in practice.

Automated Measurement of Technical Debt: A Systematic Literature Review

101

Table 5: Techniques with input, output, and calculation.

Technique Input Calculation Output Ref

SQALE 1. Target

quality level

(a list of non-

functional

requirements

that deﬁne right

code)

2. Debt-

estimating

model (as-

sociate each

requirement

with remedia-

tion function

turning number

of noncom-

pliances into

a remediation

cost)

Run the code through the anal-

ysis tools and use remediation

functions to work out remedia-

tion costs for each element.

TD is the sum of remediation

costs for all noncompliances.

This debt is called the SQALE

quality index (SQI).

Design symp-

toms of TD

(Pyramid -

an indicator

to represent

the speciﬁc

distribution of

TD for eight

characteristics)

(Letouzey

and Ilkiewicz,

2012)

CAST 1. Number of

should-ﬁx vio-

lations in an ap-

plication

2. The hours

to ﬁx each vio-

lation

3. The cost of

labor

((

∑

high-severity violations) x

(percentage to be ﬁxed) x (aver-

age hours needed to ﬁx) x ($ per

hour)) + ((

∑

medium-severity

violations) x (percentage to be

ﬁxed) x (average hours needed

to ﬁx) x ($ per hour)) + ((

∑

low-

severity violations) x (percent-

age to be ﬁxed) x (average hours

needed to ﬁx) x ($ per hour))

Remediation

Cost

(Curtis et al.,

2012)

SIG 1. Source code

2. Target qual-

ity level

3. The cost of

labor

For the extraction of measure-

ment values from source code,

the Software Analysis Toolkit of

SIG is used.

RE = RF * (SS * TF) * RA

ME =

MF ∗(SS ∗ (1 + r)

∗ T F)

(QualityLevel−3)/2

1. Remediation

Cost

2. Non-

remediation

Cost

(Nugroho et al.,

2011)

Benchmarking-

based Model

1. Static code

analyzers

output data (ref-

erence projects)

2. Source code

3. Target qual-

ity level

4. The cost of

labor

Tool support is available

(Ploesch et al., 2008) that

facilitates triggering code anal-

ysis tools as well as building

the benchmark database and

benchmark suite

1. the target quality level is

speciﬁed

2. # of maximum allowed

violations is calculated

3. # of violations to be ﬁxed is

calculated

4. # of violations to be ﬁxed *

the estimated effort for ﬁxing *

an hourly cost rate

Remediation

Cost

(Mayr et al.,

2014)

ICEIS 2019 - 21st International Conference on Enterprise Information Systems

102

Table 5: Techniques with input, output, and calculation (cont.).

Technique Input Calculation Output Ref

A Fluctuation-

based Mod-

elling Approach

Candidate

cloud-based

mobile service

Quantifying the TD during the

ﬁrst year

T D

= 12 ∗ [ppm ∗ (U

max

−

curr

)−C

u/m

∗(U

max

−U

curr

)] =

12 ∗ (U

max

− U

curr

) ∗ (ppm −

u/m

)

from the second and onwards

T D

= 12 ∗ [K

i−2

∗ [U

max

−

i−2

] − M

i−2

∗ [U

max

− L

i−2

]] =

12 ∗ (U

max

− L

i−2

) ∗ (K

i−2

−

i−2

), i ¿ 1

Relative

amount of

(Skourletopoulos

et al., 2015)

Breaking Point

for TD

1. Number of

should-ﬁx vio-

lations in an ap-

plication

2. The hours

to ﬁx each vio-

lation

3. The cost of

labor

4. Past changes

in the history

of the system

(LOC)

TD-Principal is calculated as a

function of ﬁrst 3 input vari-

ables.

Interest = addedLOC ∗ (1 −

FitnessValue(optimum)

FitnessValue(actual)

)

versions =

Principal($)

Interest($)

1. Remediation

Cost

2. Non-

remediation

Cost

3. Breaking

point

(Chatzigeorgiou

et al., 2015)

LOC and Fan-

In to Quantify

the Interest of

SATD

Source code 1. Extracting comments and

mapping them to its correspond-

ing methods

2. Determination of the change

over time in these SATD meth-

ods

3. Determining metrics measur-

ing interest

4. Calculating the interest per

SATD instance

Non-

remediation

Cost

(Kamei et al.,

2016)

A framework

for design level

Source code 1. Select a set of relevant design

ﬂaws

2. Deﬁne rules for the detection

of each design ﬂaw

3. Measure the negative in-

ﬂuence of each detected ﬂaw

instance

FlawImpactScore(FIS)

f law instance

f law type

∗ G

f law type

∗

f law instance

4. Compute an overall score

DebtSymptomsIndex =

∑

FIS

f law instance

KLOC

Design symp-

toms of TD

(Marinescu,

2012)

Automated Measurement of Technical Debt: A Systematic Literature Review

103

Table 5: Techniques with input, output, and calculation (cont.).

Technique Input Calculation Output Ref

A framework

for estimating

interest on TD

Developer

activity data

1. Establishing sessions

2. Calculate metrics related to

comprehension effort within a

session

3. Interest(I) = I

current

− I

ideal

Static metrics show presence of

TD in classes Comprehension

effort metrics quantify effort to

comprehend the classes

Non-

remediation

Cost

(Singh et al.,

2014)

Modularity

metrics for

ATD

Past changes

in the history

of the sys-

tem (commit

records)

Source code

1. Parse the commit records

to extract needed data items for

ANMCC calculation

2. Filtering out data in commit

records

3. ANMCC = (

∑

j=1

NMC(k +

j))/h

A higher ANMCC entails poten-

tial increase in ATD

1. Code map generation (XML)

2. Code map parsing

3. Modularity metrics calcula-

tion

A higher IPCI or IPGF indicate

less ATD

Relative

amount of

(Li et al., 2014)

Detecting and

quantifying

SATD

Source code 1. Project Data Extraction (

release used, the number of

classes, the total source lines of

code, the total extracted com-

ments and the number of con-

tributors)

2. Parsing the source code and

extracting the code comments

3. Filtering comments

4. Manual classiﬁcation into ﬁve

different types of SATD

# of comments

(number of

individual

line, block,

and Javadoc

comments)

(Maldonado

and Shihab,

2015)

ICEIS 2019 - 21st International Conference on Enterprise Information Systems

104

REFERENCES

Scientiﬁc toolworks, inc. understand 2.6.

http://www.scitools.com/.

Alves, N. S., Mendes, T. S., de Mendonc¸a, M. G., Sp

ınola,

R. O., Shull, F., and Seaman, C. (2016). Identiﬁca-

tion and management of technical debt: A systematic

mapping study. Information and Software Technology,

70:100–121.

Ampatzoglou, A., Ampatzoglou, A., Avgeriou, P., and

Chatzigeorgiou, A. (2015a). Establishing a framework

for managing interest in technical debt. In 5th Interna-

tional Symposium on Business Modeling and Software

Design, BMSD.

Ampatzoglou, A., Ampatzoglou, A., Chatzigeorgiou, A.,

and Avgeriou, P. (2015b). The ﬁnancial aspect of man-

aging technical debt: A systematic literature review.

Information and Software Technology, 64:52–73.

Ayewah, N., Hovemeyer, D., Morgenthaler, J. D., Penix,

J., and Pugh, W. (2008). Using static analysis to ﬁnd

bugs. IEEE software, 25(5).

Behutiye, W. N., Rodr

ıguez, P., Oivo, M., and Tosun, A.

(2017). Analyzing the concept of technical debt in the

context of agile software development: A systematic

literature review. Information and Software Technol-

ogy, 82:139–158.

Besker, T., Martini, A., and Bosch, J. (2016). A sys-

tematic literature review and a uniﬁed model of atd.

In Software Engineering and Advanced Applications

(SEAA), 2016 42th Euromicro Conference on, pages

189–197. IEEE.

Boehm, B., Grunbacher, P., and Briggs, R. O. (2001).

Developing groupware for requirements negotiation:

lessons learned. IEEE software, 18(3):46–55.

Britsman, E. and Tanriverdi,

O. (2015). Identifying tech-

nical debt impact on maintenance effort-an industrial

case study.

Chatzigeorgiou, A., Ampatzoglou, A., Ampatzoglou, A.,

and Amanatidis, T. (2015). Estimating the breaking

point for technical debt. In Managing Technical Debt

(MTD), 2015 IEEE 7th International Workshop on,

pages 53–56. IEEE.

Coman, I. D., Robillard, P., Sillitti, A., and Succi, G. (2014).

Cooperation, collaboration and pair-programming:

Field studies on backup behavior. Journal of Systems

and Software, 91(5).

Coman, I. D. and Sillitti, A. (2007). An empirical ex-

ploratory study on inferring developers’ activities

from low-level data. In 19th International Conference

on Software Engineering and Knowledge Engineering

(SEKE 2007).

Coman, I. D. and Sillitti, A. (2008). Automated identiﬁ-

cation of tasks in development sessions. In 16th IEEE

International Conference on Program Comprehension

(ICPC 2008).

Coman, I. D., Sillitti, A., and Succi, G. (2008). Investigat-

ing the usefulness of pair-programming in a mature

agile team. In 9th International Conference on eX-

treme Programming and Agile Processes in Software

Engineering (XP2008).

Corral, L., Sillitti, A., and Succi, G. (2013). Software de-

velopment processes for mobile systems: Is agile re-

ally taking over the business? In 1st International

Workshop on Mobile-Enabled Systems (MOBS 2013)

at ICSE 2013.

Corral, L., Sillitti, A., and Succi, G. (2014). Software as-

surance practices for mobile applications. Computing,

97(10).

Cunningham, W. (1992). The wycash portfolio manage-

ment system, addendum to the proceedings on object-

oriented programming systems, languages, and appli-

cations (addendum).

Curtis, B., Sappidi, J., and Szynkarski, A. (2012). Esti-

mating the size, cost, and types of technical debt. In

Proceedings of the Third International Workshop on

Managing Technical Debt, pages 49–53. IEEE Press.

Falessi, D. and Reichel, A. (2015). Towards an open-

source tool for measuring and visualizing the inter-

est of technical debt. In Managing Technical Debt

(MTD), 2015 IEEE 7th International Workshop on,

pages 1–8. IEEE.

Falessi, D., Shaw, M. A., Shull, F., Mullen, K., and Key-

mind, M. S. (2013). Practical considerations, chal-

lenges, and requirements of tool-support for managing

technical debt. In Managing Technical Debt (MTD),

2013 4th International Workshop on, pages 16–19.

IEEE.

Gaudin, O. (2009). Evaluate your technical debt with sonar.

Sonar, Jun.

Grifﬁth, I., Reimanis, D., Izurieta, C., Codabux, Z., Deo,

A., and Williams, B. (2014). The correspondence

between software quality models and technical debt

estimation approaches. In Managing Technical Debt

(MTD), 2014 Sixth International Workshop on, pages

19–26. IEEE.

Gruber, H., Pl

osch, R., and Saft, M. (2010). On the va-

lidity of benchmarking for evaluating code quality.

IWSM/MENSURA, 10.

Heitlager, I., Kuipers, T., and Visser, J. (2007). A practical

model for measuring maintainability. In Quality of

Information and Communications Technology, 2007.

QUATIC 2007. 6th International Conference on the,

pages 30–39. IEEE.

Izurieta, C., Grifﬁth, I., Reimanis, D., and Luhr, R. (2013).

On the uncertainty of technical debt measurements. In

Information Science and Applications (ICISA), 2013

International Conference on, pages 1–4. IEEE.

Kamei, Y., Maldonado, E. d. S., Shihab, E., and Ubayashi,

N. (2016). Using analytics to quantify interest of self-

admitted technical debt. In QuASoQ/TDA@ APSEC,

pages 68–71.

Kan, S. H. (2002). Metrics and models in software qual-

ity engineering. Addison-Wesley Longman Publish-

ing Co., Inc.

Kitchenham, B. and Charters, S. (2007). Guidelines for per-

forming systematic literature reviews in software en-

gineering (version 2.3). Technical report, Keele Uni-

versity and University of Durham.

Lenarduzzi, V., Sillitti, A., and Taibi, D. (2017). Analyz-

ing forty years of software maintenance models. In

Automated Measurement of Technical Debt: A Systematic Literature Review

105

39th International Conference on Software Engineer-

ing (ICSE 2017).

Letouzey, J.-L. (2012). The sqale method for evaluating

technical debt. In Managing Technical Debt (MTD),

2012 Third International Workshop on, pages 31–36.

IEEE.

Letouzey, J.-L. and Ilkiewicz, M. (2012). Managing tech-

nical debt with the sqale method. IEEE software,

29(6):44–51.

Li, Z., Avgeriou, P., and Liang, P. (2015). A systematic

mapping study on technical debt and its management.

Journal of Systems and Software, 101:193–220.

Li, Z., Liang, P., Avgeriou, P., Guelﬁ, N., and Ampatzoglou,

A. (2014). An empirical investigation of modular-

ity metrics for indicating architectural technical debt.

In Proceedings of the 10th international ACM Sigsoft

conference on Quality of software architectures, pages

119–128. ACM.

Luhr, R. L. et al. (2015). The application of technical

debt mitigation techniques to a multidisciplinary soft-

ware project. PhD thesis, Montana State University-

Bozeman, College of Engineering.

Maldonado, E. d. S. and Shihab, E. (2015). Detecting and

quantifying different types of self-admitted technical

debt. In Managing Technical Debt (MTD), 2015 IEEE

7th International Workshop on, pages 9–15. IEEE.

Marinescu, R. (2012). Assessing technical debt by identi-

fying design ﬂaws in software systems. IBM Journal

of Research and Development, 56(5):9–1.

Mayr, A., Pl

osch, R., and K

orner, C. (2014). A

benchmarking-based model for technical debt calcu-

lation. In Quality Software (QSIC), 2014 14th Inter-

national Conference on, pages 305–314. IEEE.

Monteith, J. Y. and McGregor, J. D. (2013). Exploring soft-

ware supply chains from a technical debt perspective.

In Proceedings of the 4th International Workshop on

Managing Technical Debt, pages 32–38. IEEE Press.

Moser, R., Pedrycz, W., Sillitti, A., and Succi, G. (2008).

A model to identify refactoring effort during mainte-

nance by mining source code repositories. In 9th In-

ternational Conference on Product Focused Software

Process Improvement (PROFES 2008).

Nugroho, A., Visser, J., and Kuipers, T. (2011). An em-

pirical model of technical debt and interest. In Pro-

ceedings of the 2nd Workshop on Managing Technical

Debt, pages 1–8. ACM.

Parodi, E., Matalonga, S., Macchi, D., and Solari, M.

(2016). Comparing technical debt in student exercises

using test driven development, test last and ad hoc pro-

gramming. In Computing Conference (CLEI), 2016

XLII Latin American, pages 1–10. IEEE.

Ploesch, R., Gruber, H., Pomberger, G., Saft, M., and Schif-

fer, S. (2008). Tool support for expert-centred code as-

sessments. In Software Testing, Veriﬁcation, and Val-

idation, 2008 1st International Conference on, pages

258–267. IEEE.

Poliakov, D. et al. (2015). A systematic mapping study on

technical debt deﬁnition.

Potdar, A. and Shihab, E. (2014). An exploratory study

on self-admitted technical debt. In Software Main-

tenance and Evolution (ICSME), 2014 IEEE Interna-

tional Conference on, pages 91–100. IEEE.

Ribeiro, L. F., de Freitas Farias, M. A., Mendonc¸a, M. G.,

and Sp

ınola, R. O. (2016). Decision criteria for the

payment of technical debt in software projects: A sys-

tematic mapping study. In ICEIS (1), pages 572–579.

Singh, V., Pollock, L. L., Snipes, W., and Kraft, N. A.

(2016). A case study of program comprehension effort

and technical debt estimations. In Program Compre-

hension (ICPC), 2016 IEEE 24th International Con-

ference on, pages 1–9. IEEE.

Singh, V., Snipes, W., and Kraft, N. A. (2014). A framework

for estimating interest on technical debt by monitoring

developer activity related to code comprehension. In

Managing Technical Debt (MTD), 2014 Sixth Interna-

tional Workshop on, pages 27–30. IEEE.

Skourletopoulos, G., Mavromoustakis, C. X., Mastorakis,

G., Rodrigues, J. J., Chatzimisios, P., and Batalla,

J. M. (2015). A ﬂuctuation-based modelling ap-

proach to quantiﬁcation of the technical debt on mo-

bile cloud-based service level. In Globecom Work-

shops (GC Wkshps), 2015 IEEE, pages 1–6. IEEE.

Snipes, W., Nair, A. R., and Murphy-Hill, E. (2014). Expe-

riences gamifying developer adoption of practices and

tools. In Companion Proceedings of the 36th Inter-

national Conference on Software Engineering, pages

105–114. ACM.

Tom, E., Aurum, A., and Vidgen, R. (2013). An exploration

of technical debt. Journal of Systems and Software,

86(6):1498–1516.

Tsantalis, N., Chaikalis, T., and Chatzigeorgiou, A. (2008).

Jdeodorant: Identiﬁcation and removal of type-

checking bad smells. In Software Maintenance and

Reengineering, 2008. CSMR 2008. 12th European

Conference on, pages 329–331. IEEE.

Wheeler, D. A. (2001). More than a gigabuck: Estimating

gnu/linux’s size.

ICEIS 2019 - 21st International Conference on Enterprise Information Systems

106