Identifying Technical Debt through Code Comment Analysis

Mário André de Freitas Farias

1,2

, Methanias Colaço

, Rodrigo Oliveira Spínola

3,4

and Manoel G. de Mendonça Neto

Federal Institute of Sergipe, Lagarto, Brazil

Department of Post-Graduate in Computer Science, Federal University of Bahia, Salvador, Brazil

Fraunhofer Project Center at UFBA, Salvador, Brazil

Salvador University, Salvador, Brazil

Federal University of Sergipe, Itabaiana, Brazil

1 RESEARCH PROBLEM

The term Technical Debt (TD) is being increasingly

used to discuss technical compromises admitted by

the development team during the phases of the

software life cycle. Thus, this metaphor defines the

Trade-off between internal tasks you choose do not

perform at present, and the risk of causing future

problems (Izurieta et al., 2012). Currently, this may

include immature software artifacts such as issues in

the software design and in the software architecture,

incomplete or insufficient documentation,

incomplete design specification, insufficient code

comment, lack of adequate testing, or inadequate

technology (Alves et al., 2014).

The identification of TD is the first step to

effectively manage TD, it is necessary to identify

TD items in the project before prioritizing them and

select those which should be paid or not (Guo et al.,

2014). The term “TD item” represents an instance of

TD for the purpose of this study.

Recent systematic reviews (Li et al., 2014)

(Alves et al., 2016) reported that code quality

analysis techniques have been frequently studied to

support the identification of TD. In this sense,

automatic analysis tools have used software metrics

extracted from the source code to identify TD items

by comparing values of software metrics to

predefined thresholds (Mendes et al., 2015).

Li et al. (Li et al., 2014) analyzed and classified

29 different tools. Only one (FxCop) takes .NET

assemblies as input, one (RE-KOMBINE) takes

requirements and solutions as input, and one (CLIO)

takes compiled binaries as input. Most 86% of the

tools take source code as input, some of these tools

are mentioned as follows: (i) SIG Software Analysis

Toolkit is used to calculate code properties to

identify code TD (Nugroho et al., 2011), (ii)

Resource Standard Metrics calculates source code

metrics and analyzing code quality to find style

violations and logic problems. This tool can identify

design and code TD, and (iii) CodeVizard is a tool

for detecting design TD thought code smells

identification (Zazworka and Ackermann, 2010).

Although these tools and techniques have shown

useful to extract structural information and support

the automated management of some types of debt,

they do not cover its real meaning and human factors

(e.g., tasks commented as future work), taking aside

them on it (Zazworka et al., 2013) (Potdar and

Shihab, 2014). Thus, they may not point to a

relevant TD or not report a piece of code which is

really considered a TD by developers. Besides, some

types of debt are undetectable by tools and may not

be directly identified using only metrics collected

from the source code (Farias et al., 2015).

In this sense, pieces of code that need to be

refactored to improve the quality of the software

may continue unknown. In order to complement the

TD identification with more contextual and

qualitative data, human factors and the developers’

point of view should be considered.

2 OUTLINE OF OBJECTIVES

2.1 General Objective

Our objective is to propose an approach to support

and automate the identification and management of

different TD types through code comment analysis

by considering the developers’ point of view.

2.2 Complementary Objectives

 Perform a systematic mapping study with the

aim of investigating how research is being

conducted in the mining software repositories

field. This allowed us to identify the main

approaches with focus on comment analysis;

Farias, M., Colaço, M., Spínola, R. and Neto, M.

Identifying Technical Debt through Code Comment Analysis.

In Doctoral Consortium (DCEIS 2016), pages 9-14

 Analyze and categorize contextualized terms,

combinations, and patterns to understand how

they may be combined to support the

identification and management of different TD

types through the comments analysis;

 Create a structure that systematically allows

combining terms providing a large vocabulary to

support the TD identification;

 Develop an automated tool in order to quickly

analyze developer’s comments embedded in

source codes;

 Propose and plan a family of experiments. From

them, we intend to evaluate and evolve our

approach with the purpose of characterizing its

overall accuracy and factors affecting the

identification of TD through code comment

analysis.

 Improve our proposed model, vocabulary, and tool;

3 STATE OF THE ART

3.1 Code Comments

Comments are a generic type of task annotation

where developers insert documentation directly into

source code (Storey et al., 2008). These annotations

are richness of semantic information written in

natural language.

Despite there are different syntaxes and types of

comments according to the programming language,

they are divided into two categories: (i) inline

comments, which only permit the insertion of one

line of comment, and (ii) block comments, which

permit the insertion of several lines. Developers write

comments in a sublanguage of English using a limited

set of verbs and tenses, and personal pronouns are

almost not used (Davis and Bowen, 2001).

Code comments and the source code itself are an

important documentation to help the software

comprehension (Souza et al., 2006). These

descriptions may reveal important information, such

as the reason for adding new lines to source code,

knowing about the progress of a collective task, or

even why relevant changes were performed. Thus,

comments may be used to describe issues that may

require work in the future, notice emerging problems

and what decisions need to be taken about them

(Maalej and Happel, 2010) (Shokripour et al., 2013).

3.2 Code Comment Analysis

When the source code is well commented, we can

understand what a piece of code does, what issues it

has, and whether it needs to be fixed or improved,

without needing to analyze its implementation. In

general, comments are used by developers to

understand unfamiliar software because comments

are written in natural language (Freitas et al., 2012).

Comments provide an important set of information

which may help to understand software features, and

make easier software comprehension.

In fact, comments have been used to describe

issues that may require future work, emerging

problems and decisions taken about those problems

(Maalej and Happel, 2010). These descriptions

facilitate human readability and provide additional

information that summarizes the developer context.

In this sense, authors have conducted

experiments using code comments as data source in

several research works in order to discuss and

analyze their importance on the software

comprehension.

In (Storey et al.,, 2008) the authors described an

empirical study that explored how code comments

play a role in how developers deal with software

maintenance tasks, investigating how comments

may improve processes and tools that are used for

managing these tasks. In similar approach, (Maalej

and Happel, 2010) analyzed the purpose of work

descriptions and code comments aiming to discuss

how automated tools can support developers in

creating them.

Some research works from Mining Software

Repository (MSR) have focused on code comments.

Yang and Tan (Yang and Tan, 2012) proposed an

approach that analyses the word context in code

comments. The main idea is to discover semantically

related words. Many words that are semantically

related in software development process are not

semantically related in English. In this same sense,

Howard et al. (Howard et al., 2013) also presented

an approach to augment natural language thesauri

with code-related terms.

(Freitas et al., 2012) presents an approach to

locate problem domain concepts on comments, and

identify the relevant code chunks associated with

them. The authors also introduce Darius, a tool to

implement their proposal for Java programs. Darius

identifies and extracts inline, bock, and Javadoc

comments and provides some metrics. They

concluded that higher level source entities tend to

have comments oriented for problem domain

information, whereas comments of lower level

source entities tend to include more program domain

information.

In other work, Gupta et al. (Gupta et al., 2013)

suggested a Part-of-speech tagging of program

DCEIS 2016 - Doctoral Consortium on Enterprise Information Systems

identifiers to understand how a program element is

named. Considering identifiers and comments,

(Salviulo et al., 2014) performed an experiment with

students and young professional developers in order

to understand how they perceive comments and

identifier names.

3.3 Using Code Comments to Identify

More recently, code comments have been explored

with the purpose of identifying TD. Potdar and

Shihab (Potdar and Shihab, 2014) analyzed code

comments to identify text patterns and TD items. For

that, the authors used the srcML toolkit (Maletic et

al., 2002), a command line tool that parses source

code into XML file, to extract the comments. In this

step, the authors considered all types of comments.

This decision may bring a lot of unnecessary effort

because it considers comments that are not important

to TD scope, such as license and Auto-generated

comments. After the data extraction, the authors

identified comments that indicate TD.

They read more than 101K code comments, and

organized 62 text patterns that were used to quantify

how much TD exist in four different projects

(Eclipse, Chromium OS, ArgoUML, and Apache

http). Their findings show that 2.4 - 31.0% of the

files in a project contain self-admitted TD. In

addition, the most used text patterns were: (i) “is

there a problem” with 36 instances in the Eclipse,

(ii) “hack” with 17 instances in the ArgoUML, and

(iii) “fixme” with 20 instances in the Apache, and

761 instances in the Chrominum OS.

In another TD identification approach,

Maldonado and Shihab (Maldonado and Shihab,

2015) evolved the work of Potdar and Shihab

(Potdar and Shihab, 2014), proposing four simple

filtering heuristics to eliminate comments that are

not likely to contain technical debt. For that, they

read 33K code comments from five open source

projects (Apache Ant, Apache Jmeter, ArgoUML,

Columba, and JFreeChart). Their findings showed

that self-admitted technical debt can be classified

into five main types: design debt, defect debt,

documentation debt, requirement debt, and test debt.

According to the authors, the most common type of

self-admitted TD is design debt (between 42% and

84% of the classified comments).

In the same sense, Farias et al. (Farias et al.,

2015) proposed a model aiming to support the

detection of different types of debt through code

comment analysis.

These research works provide preliminary

indication that comments can be effectively used to

support TD identification. However, the factors that

may affect its accurate usage are still unknown.

4 METHODOLOGY

Research works in software engineering have widely

been conducted with focus on quantitative analysis.

In general, this type of study analyzes treatment of

variables, control groups, and statistical data

(Wohlin et al., 2012).

Differently from quantitative analysis, qualitative

one appears to be unusual in software engineering

approaches. Using this method is possible to achieve

aspects behind the problem under study, analyze

data, and suggest conclusions to which other

methods would be blind (Segal et al., 2005).

In order to include the combining of quantitative

and qualitative approaches as complementary

methods, we will use triangulation methodology to

analyze how code comment analysis supports the

TD identification.

Triangulation is a research strategy described as

a convergent methodology with multiple

operationalisms (Campbell and Fiske, 1959). The

main idea is to analyze evidences from different

sources, be collected using different methods, have

different forms, be analyzed using different

methods, or come from a different study altogether

(Shull et al., 2008). In this methodology, researchers

can improve conclusions on their judgments through

collecting and analysis of different data considering

the same phenomenon (Jick 1979).

4.1 Overview

Figure 1 presents an overview of the research. We

explore code comments using a systematic mapping

study, and a family of experiment in order to

propose methods and techniques to support the

identification and management of different TD

types. With the results of the studies, we expected

advance the set of knowledge on how improve the

TD identification process through code comment

analysis. We briefly describe each step following the

numbers of Figure 1.

To begin with, we performed a systematic

mapping study (1) in order to understand the mining

software repositories area and to identify its current

targets and gaps, regarding mainly source codes and

comments analysis (Farias et al., 2016). We

identified some important studies on usage of

comments for software comprehension, and some

Identifying Technical Debt through Code Comment Analysis

Figure 1: Overview of Research.

techniques and tools used to extract and analyze

comments.

Second, we developed a Contextualized

Vocabulary Model for identifying TD on code

comments (CVM-TD) (2). CVM-TD uses word

classes and code tags to support TD identification.

The model provides a structure that systematically

allows combining terms creating a large vocabulary

on TD (Farias et al., 2015).

Next, our work proposes a family of experiment

called FindTD composed by five experiments, two

exploratory studies, and three controlled

experiments. A family of experiments involves not

only replications, but variations among the

experiments (Basili et al., 1999). In this respect, we

intend to perform experimental variations in order to

evaluate and evolve our set of knowledge on the

proposed model and techniques to identify TD using

code comments.

In the first experiment (FindTD I), an exploratory

study (3) was performed to characterize the feasibility

of the proposed model to support the detection of TD

through code comments analysis. For that, we

developed a tool to extract comments from the software

code, the eXcomment. This tool extracts and filters

candidate comments from source code using the

contextualized vocabulary provided by the model.

Following, the promising initial outcomes

motivated us to further evaluate CVM-TD with other

data sources. Thus, we performed a controlled

experiment (FindTD II) (4). Therefore, in this

evaluation we extend Farias et al. (Farias et al.,

2015) with an additional quantitative study. We

analyzed the use of CVM-TD with the purpose of

characterizing its overall accuracy when classifying

candidate comments and factors that influence the

analysis of the comments to support the

identification of TD in terms of accuracy. For each

candidate comment listed in a form, the participants

chose "Yes" or "No", and their level of confidence

on their answer. They used an ordinal scale of one to

four to represent the confidence. Besides, for each

comment marked as yes, they should highlight the

pieces of text that was decisive for giving this

answer (set of comment patterns).

Our findings indicate that CVM-TD provided

promising results considering the accuracy values. We

observed that many comments had high agreement, and

almost 60% of comments filtered by terms that belong

to the vocabulary (candidate comments) proposed in

(Farias et al., 2015) were identified as good indicators

of TD (Farias et al., 2016).

Next, we designed FindTD III (6) from insights

of FinTD II, by changing the setup and controlling

other variables. Our main goal in this experiment is

to analyze the set of comment patterns identified and

classified in previous experiment. We intend to carry

out a qualitative analysis in order to improve the

model and the vocabulary, identifying the most

important patterns, and the relationship between

comment patterns and TD types. To do this, we will

perform a coding to group patterns into different

comment indicators.

Coding is a method that enables researchers to

organize and group similar data into categories or

themes, attaching labels or codes to different

segments - the beginning of themes. Information

from different sources can be easily sorted and

compared. A theme is an outcome of coding,

categorization, and analytic reflection, not

DCEIS 2016 - Doctoral Consortium on Enterprise Information Systems

something that is, in itself coded (Cruzes and Dyba,

2011) (Ellsberg and Heise, 2005).

This experiment will provide us inputs to

improve CVM-TD, resulting a new release of the

contextualized vocabulary. We also intend to

develop new features in eXcomment. This feature is

associated with the new vocabulary to quickly

support the interpretation of comments (7).

After this study, we planned to perform a controlled

experiment (8). In this experiment, besides evaluating

the new release of the contextualized vocabulary and

tool, we expect to compare the overall accuracy when

classifying candidate comments between two groups,

one using the tool to analyze comments and another

one without the tool.

In FindTD V (9), we expected to perform an

exploratory study in the software industry. In this

study, we purpose to compare patterns and TD items

identified into open source code and closed code

developed in an industrial environment.

The last one is the FindTD VI (10). We intend to

compare our approaches to different tools that use

metrics extracted from the source code to identify

TD items.

Our methodology might set some limitations on

what can be experimented. The first considers the

power of the proposed vocabulary. It is possible that

the set of terms and combinations used by our model

and vocabulary are simply too many to be studied.

An alternative would be to limit the studies to a very

specific context and software. Other risk involves

the effort to carry out all studies because of the

difficulty of performing experiments in this area.

5 EXPECTED OUTCOME

In the context of our empirical investigation, we are

interested in findings that help us to comprehend

how code comments analysis can support the

identification and management of different TD

types, considering the developers’ point of view.

We hope to develop a rich contextualized

vocabulary and a tool to support the TD

identification through comment analysis. We believe

this approach can improve methods of identifying

and classifying TD items, analyzing code comments.

6 STAGE OF THE RESEARCH

In accordance with proposed methodology described

in Section 4 and shown in Figure 1, we purpose a

systematic mapping study and a family of

experiment in order to discuss our goals. Figure 1 is

broken down into two parts. The tasks that have

already performed were organized on top of the

figure (part A), and the tasks that will be performed

in the future were organized on bottom of the figure

(part B).

In this sense, we have: (i) performed the

systematic mapping study, (ii) developed a

Contextualized Vocabulary Model, (iii) performed

the first exploratory study and its analysis, (iv)

performed the first controlled experiment and its

analysis, and (v) designed the FindTD III and carried

out the experiment. Currently, we are analyzing data

from this study, using qualitative methods.

REFERENCES

Alves, N.S.R. et al., 2016. Identification and Management

of Technical Debt: A Systematic Mapping Study.

Information and Software Technology, 70, pp.100–

121.

Alves, N.S.R. et al., 2014. Towards an Ontology of Terms

on Technical Debt. In Sixth International Workshop

on Managing Technical Debt (MTD). pp. 1–7.

Basili, V.R., Shull, F. and Lanubile, F., 1999. Building

knowledge through families of experiments. IEEE

Transactions on Software Engineering, 25(4), pp.456–

473.

Campbell, D.T. and Fiske, D.W., 1959. Convergent and

discriminant validation by the multitrait-multimethod

matrix. Psychological Bulletin, 56(2), pp.81–105.

Cruzes, D.S. and Dyba, T., 2011. Recommended Steps for

Thematic Synthesis in Software Engineering. 2011

International Symposium on Empirical Software

Engineering and Measurement, (7491), pp.275–284.

Davis, C.G. and Bowen, L.L., 2001. The language of

comments in computer software : A sublanguage of

English. , 166(00), pp.1731–1756.

Ellsberg, M. and Heise, L., 2005. Researching Violence

Against Women. A PRACTICAL GUIDE FOR

RESEARCHERS AND ACTIVISTS, Washington:

World Health.

Farias, M. et al., 2015. A Contextualized Vocabulary

Model for Identifying Technical Debt on Code

Comments. In Seventh International Workshop on

Managing Technical Debt. pp. 25–32.

Farias, M.A. de F., Novais, R., et al., 2016. A Systematic

Mapping Study on Mining Software Repositories. In

ACM SAC.

Farias, M.A. de F., Silva, A.B., et al., 2016. Investigating

the Use of a Contextualized Vocabulary in the

Identification of Technical Debt : A Controlled

Experiment. In 18Th International Conference on

Enterprise Information System - ICEIS (Accepted).

Freitas, J.L., Da Cruz, D. and Henriques, P.R., 2012. A

comment analysis approach for program

Identifying Technical Debt through Code Comment Analysis

comprehension. Proceedings of the 2012 IEEE 35th

Software Engineering Workshop, SEW 2012, pp.11–

20.

Guo, Y., Spínola, R.O. and Seaman, C., 2014. Exploring

the costs of technical debt management – a case study.

Empirical Software Engineering, 1, pp.1–24.

Gupta, S. et al., 2013. Part-of-speech tagging of program

identifiers for improved text-based software

engineering tools. IEEE International Conference on

Program Comprehension, pp.3–12.

Howard, M.J. et al., 2013. Automatically mining software-

based, semantically-similar words from comment-code

mappings. IEEE International Working Conference on

Mining Software Repositories, pp.377–386.

Izurieta, C. et al., 2012. Organizing the technical debt

landscape. 2012 3rd International Workshop on

Managing Technical Debt, MTD 2012 - Proceedings,

pp.23–26.

Jick, T.D., 1979. Mixing Qualitative and Quantitative

Methods : Triangulation in Action Mixing Qualitative

and Quantitative Methods : Triangulation in Action *.

Qualitative Methodology, 24(4), pp.602–611.

Li, Z. et al., 2014. A systematic mapping study on

technical debt. The Journal of Systems and Software,

101, pp.193–220.

Maalej, W. and Happel, H.-J., 2010. Can development

work describe itself? 7th IEEE Working Conference

on Mining Software Repositories (MSR), pp.191–200.

Maldonado, E.S. and Shihab, E., 2015. Detecting and

Quantifying Different Types of Self-Admitted

Technical Debt. In 7th International Workshop on

Managing Technical Debt. pp. 9–15.

Maletic, J.I., Collard, M.L. and Marcus, A., 2002. Source

Code Files as Structured Documents. In Proceedings.

10th International Workshop on. pp. 289–292.

Mendes, T.S. et al., 2015. VisMinerTD - An Open Source

Tool to Support the Monitoring of the Technical Debt

Evolution using Software Visualization. In 17th

International Conference on Enterprise Information

Systems.

Nugroho, A., Visser, J. and Kuipers, T., 2011. An

Empirical Model of Technical Debt and Interest

Software Improvement Group. Proceeding of the 2nd

working on Managing technical debt MTD 11, p.1.

Potdar, A. and Shihab, E., 2014. An Exploratory Study on

Self-Admitted Technical Debt. In IEEE International

Conference on Software Maintenance and Evolution.

pp. 91–100.

Salviulo, F. et al., 2014. Dealing with Identifiers and

Comments in Source Code Comprehension and

Maintenance : Results from an Ethnographically-

informed Study with Students and Professionals. In

Proceedings of the 18th international conference on

evaluation and assessment in software engineering.

ACM. p. 48.

Segal, J., Grinyer, A. and Sharp, H., 2005. The type of

evidence produced by empirical software engineers.

ACM SIGSOFT Software Engineering Notes, 30(4),

pp.1–4.

Shokripour, R. et al., 2013. Why So Complicated ? Simple

Term Filtering and Weighting for Location-Based Bug

Report Assignment Recommendation. , pp.2–11.

Shull, F., Singer, J. and Sjoberg, D., 2008. Guide to

Advanced Empirical Software Engineering, Springer.

Souza, S.C.B. et al., 2006. Which documentation for

software maintenance? Journal of the Brazilian

Computer Society, 12(3), pp.31–44.

Storey, M. et al., 2008. TODO or To Bug : Exploring How

Task Annotations Play a Role in the Work Practices of

Software Developers. In ICSE: International

Conference on Software Engineering. pp. 251–260.

Wohlin, C. et al., 2012. Experimentation in Software

Engineering,

Yang, J. and Tan, L., 2012. Inferring semantically related

words from software context. In 2012 9th IEEE

Working Conference on Mining Software Repositories

(MSR). Ieee, pp. 161–170.

Zazworka, N. et al., 2013. A case study on effectively

identifying technical debt. In Proceedings of the 17th

International Conference on Evaluation and

Assessment in Software Engineering - EASE ’13. New

York, New York, USA: ACM Press, pp. 42–47.

Zazworka, N. and Ackermann, C., 2010. CodeVizard: A

Tool to Aid the Analysis of Software Evolution.

Proceedings of the 2010 ACM-IEEE International

Symposium on Empirical Software Engineering and

Measurement, 2(4), pp.63:1–63:1.

DCEIS 2016 - Doctoral Consortium on Enterprise Information Systems