BUILDING MAINTENANCE CHARTS AND EARLY WARNING

ABOUT SCHEDULING PROBLEMS IN SOFTWARE PROJECTS

∗

Sergiu Gordea

University Klagenfurt, Computer Science and Manufacturing

Universit

atsstrasse 65-67, A-9020 Klagenfurt, Austria

Markus Zanker

University Klagenfurt, Computer Science and Manufacturing

Universit

atsstrasse 65-67, A-9020 Klagenfurt, Austria

Keywords:

Software Engineering, Software Maintenance, Maintenance Efforts Classiﬁcation, Statistical Process Control.

Abstract:

Imprecise effort estimations are a well known problem of software project management that frequently leads

to the setting of unrealistic deadlines. The estimations are even less precise when the development of new

product releases is mixed with the maintenance of older versions of the system. Software engineering mea-

surement should assess the development process and discover problems occurring into it. However, there are

evidences indicating a low success rate of measurement programs mainly because they are not able to extract

knowledge and present it in a form that is easy understandable for developers and managers. They are also not

able to suggest corrective actions basing on the collected metric data. In our work we propose an approach

for classifying time efforts into maintenance categories, and propose the usage of maintenance charts for con-

trolling the development process and warning about scheduling problems. Identifying scheduling problems as

soon as possible will allow managers to plan effective corrective actions and still cope with the planned release

deadlines even if unpredicted development problems occur.

1 INTRODUCTION

Effort estimation is known to be one of the most chal-

lenging problems of software project management.

Recent studies show that only about 25% of software

projects are successfully completed in time and in

budget (Liu et al., 2003). Effort estimations are more

imprecise when maintenance activities of older sys-

tem versions are run in parallel with development of

new product releases. When making release plans,

project managers need to take into account the efforts

required for implementing new functionality for the

next release as well as the efforts required for cor-

recting old system defects and new defects discov-

ered into productive systems and the available human

resources, too. In the world of software engineering

that is so complex and so immaterial there are a lot of

events that brake these plans. In reality there are no

ideal cases where each part of a project is completed

exactly as scheduled. Being short before or behind

schedule is not a problem as long as the process is

∗

This work is carried out with ﬁnancial support from

the EU, the Austrian Federal Government and the State

of Carinthia in the Interreg IIIA project Software Cluster

South Tyrol - Carinthia

under statistical control and within predicted risk lim-

its. One of the most important problems of software

project management is that without having appropri-

ate warning mechanisms, managers discover too late

schedule overruns, wrong estimations and software

quality problems and it is too late to correct and mini-

mize their effect (Florac and Carleton, 1999). In order

to be able to deliver projects in time, budget and with

a high level of quality, project managers need to be

early warned about the risks associated with a project

that runs out of control (Liu et al., 2003).

Software Engineering Measurement (SEM) is a key

practice in high maturity organizations. The 4’th Ca-

pability Maturity Model Integration (CMMI) level,

also known as qualitatively managed level, deﬁnes

key practices for quality management and process

measurement and analysis

. Companies situated on

this maturity level start to use quantitative measure-

ment and use statistical process control for improving

the quality and increasing the efﬁcacy of their pro-

cesses. Unfortunately, under relatively restricted bud-

gets conditions, small and medium software compa-

nies are not able to effectively introduce these prac-

tices into their development process. Software pro-

see http://www.sei.cmu.edu/cmmi/ for reference

210

Gordea S. and Zanker M. (2006).

BUILDING MAINTENANCE CHARTS AND EARLY WARNING ABOUT SCHEDULING PROBLEMS IN SOFTWARE PROJECTS.

In Proceedings of the First International Conference on Software and Data Technologies, pages 210-217

DOI: 10.5220/0001319902100217

 SciTePress

cess management is not a business goal in these com-

panies, and they are not ready to pay the relatively

high costs associated to software process measure-

ment and analysis. Goethert and Hayes present a set

of experiences from implementing measurement pro-

grams indicating that measurements programs have a

success rate below 20% (Goethert and Hayes, 2001).

The measurement programs usually fail because the

collected metrics are found to be irrelevant or not well

understood by key players, expensive and cumber-

some. Also no actions on the numbers are suggested,

and some of the collected metrics are perceived to be

unfair and the developers manifest against their usage

(Brown and Goldenson, 2004),(Goethert and Hayes,

2001).

In this paper we try to counteract the presented

management and measurement problems by propos-

ing an approach based on:

• Collection of time efforts and their classiﬁcation

into maintenance categories.

• Building of maintenance charts

• Warning about development and scheduling prob-

lems

The collection of time efforts and software met-

rics can be automated by employing tools like Prom

(Sillitti et al., 2003) or HackyStat (Johnson et al.,

2003). In section 3.2 we present three models used

for classifying the maintenance efforts. In this way

we present the results in an easy understandable form

to managers and developers. In this paper we sup-

port the hypothesis that the maintenance charts and

the warning mechanism presented in section 3.3 are

valuable solutions for software process assessment,

helping the managers to easily interpret the evolution

in time of maintenance efforts and to ﬁnd the sources

of scheduling problems.

2 RELATED WORK

Software measurement is a research topic since many

years, and still continues to be an open research ﬁeld

due to the continuous evolution of software technolo-

gies, paradigms and project management techniques.

In the followings we reference a selection of related

work in the areas of software metrics, software main-

tenance, and artiﬁcial intelligence techniques applied

in software engineering.

The software process improvement (SPI) is a hot

topic in software industry, which bases itself on the

collection of product and process metrics. In order to

be effective SPI must employ tools that automatically

collect metrics with low costs high quality (e.g. man-

ually collected data are error prone and inﬂuenced

by human judgement). Hackystat (Johnson et al.,

2003) and Prom (Sillitti et al., 2003) are so called SPI

tools of third generation, that facilitate the collection

of product (software) and process metrics (time ef-

forts spent for in designing and developing software

projects).

Because of the ”legacy crisis”, described by Sea-

cord et al. in (Seacord et al., 2003), the measure-

ment and estimation of software maintenance efforts

gained special attention starting with ’80s. Studies

referenced by Seacord et.al. show that the most life

cycle costs of information systems occur after the

ﬁrst software release. Another studies published in

the 80’s showed that on average the corrective efforts

take about 20% of the total maintenance efforts while

adaptive efforts take about 25%, and the most part

of 50% is directed to perfective category (Lientz and

Swanson, 1980). Usually, preventive efforts are not

greater than 5% of the total maintenance efforts (Sea-

cord et al., 2003). More recent studies from environ-

ments involving newer technologies conﬁrm the same

distribution of maintenance efforts (Vliet, 2000), even

in the case of web applications (Lee and Jefferson,

2005).

Generally, the information used in these reports

is extracted from change logs, issue tracking sys-

tems and/or version control systems, which is manu-

ally collected and usually incomplete and error prone

(Graves and Mockus, 1998; Kemerer and Slaughter,

1999; Zanker and Gordea, 2006). From our knowl-

edge the work presented in this paper is the ﬁrst at-

tempt of classifying efforts in maintenance categories

basing on automatic collected time information.

In the last years, different artiﬁcial intelligence

techniques were employed for extracting knowledge

out of the metric data and for learning models that

assess different software engineering tasks like: pre-

dictions and estimation of software size & quality, de-

velopment and maintenance efforts and costs (Zhang

and Tsai, 2003). Similar to our approach Liu et.

al. present a warning system for early detection of

scheduling and budgeting problems, as well as low

quality risks, based on software metrics and rules

extracted with a fuzzy inference engine (Liu et al.,

2003). Different from Liu’s work we focus our at-

tention on the evolution in time of development and

maintenance efforts, and reasoning on maintenance

charts. An analysis of software maintenance data us-

ing bayesian networks, decision trees and expert net-

works is presented in (Reformat and Wu, 2003).

Some software metrics are strongly correlated with

each other, therefore using all available metrics to in-

fer a decision model does not necessary improve the

resulting model. Contrary, there are cases when com-

plex models based on large sets of variables provide

inferior prediction accuracy than alternative models

based on smaller sets of variables (Thwin and Quah,

2005), (Khosgoftaar et al., 2003).

BUILDING MAINTENANCE CHARTS AND EARLY WARNING ABOUT SCHEDULING PROBLEMS IN

SOFTWARE PROJECTS

211

3 MONITORING AND

CONTROLLING

MAINTENANCE EFFORTS

In the development of almost all information systems

there is a high pressure to release the ﬁrst working

version of the system as soon as possible making

a compromise between the time to market and the

quality of software products. Afterwards, the sys-

tems enter into a maintenance process with enhance-

ment/modernization cycles and periodical new ver-

sion releases (Seacord et al., 2003). In many cases,

when working on a new release, the activities related

to the implementation of new functionality are mixed

with the ones related to the correction of defects found

in previous releases. While the ﬁrst category of efforts

are typically payed by the customer, the second type

of costs are covered by maintenance fees. Under this

assumptions it is very important to measure and con-

trol the distribution of the development efforts over

different maintenance activities.

3.1 Maintenance Categories

Taking into consideration the reasons of software

changes, the maintenance efforts were classiﬁed by

Swanson and Lientz into 4 categories (Lientz and

Swanson, 1980): perfective, corrective, preventive

and adaptive.

The perfective maintenance (PeM) typically con-

sists of activities related to implementation of new

system functionality, which usually take more time

to be completed than other development activities.

When enhancing system functionality new classes are

added into the system and new methods as well (in

existing and/or in the new classes). Under these con-

ditions the value of all metrics, representing structural

or complexity changes is increasing, and the share

of time efforts spent for these activities are relatively

high.

The corrective maintenance (CM) deals mainly

with the elimination of system defects (also called

bugs in software development communities). It af-

fects existing artifacts, by changing parts of the source

code that cause system misbehavior, which typically

means correction or even re-implementation of ex-

isting algorithms. Usually, this kind of maintenance

modiﬁes the complexity and the size of existing arti-

facts without changing their structure too much, but

in some cases the structure is also signiﬁcantly af-

fected. For example, there are cases when old pieces

are deleted because they are not used anymore, or

cases when the algorithms don’t consider all possi-

ble combinations of the input variables. In the last

case it is required to treat new special cases by imple-

menting new classes or methods. Depending on the

severity and the nature of the corrected defects, the

tasks associated to this maintenance activities may be

completed in larger or smaller time intervals.

Preventive Maintenance (PrM) gained special at-

tention in the last 20 years, when the demand for high

quality was constantly increasing. Preventive mainte-

nance activities have the goal to improve the quality

of the source code and correct those parts of the code

that are suspected to introduce future system defects.

In this category are included the so called ”code re-

views”, and also the agile practices like implementa-

tion of test cases, refactorings.

Preventive Maintenance - implementation of test

cases (P rM

). Since agile practitioners emphasized

the test driven development, many companies started

to adopt unit testing as an important component of

their development process. It aims at verifying soft-

ware’s correct functionality and early identiﬁcation of

system defects. For implementing unit tests, java de-

velopers extend the functionality of JUNIT

library

and implement project speciﬁc test cases. Identiﬁca-

tion of test cases in the source code can be done bas-

ing on the naming conventions (test classes include

the ”Test” preﬁx, or sufﬁx in their names), basing on

the class inheritance tree (test cases are subclasses of

JUnit’s TestCase class) and physical location of the

source ﬁles (test cases are kept apart from project’s

source code, they are usually placed in folders that are

exclusively dedicated to unit tests). Similar to per-

fective maintenance, this type of maintenance activ-

ities creates new artifacts, changing the structure of

the source code. Since these artifacts are quite sim-

ple and small the efforts invested for their creation are

relatively low.

Preventive Maintenance - Refactoring (P rM

Refactorings are changes of the internal structure of

source code that improve its modularity, readability

and understandability without changing its observable

behaviour (Fowler, 1999). The source code refactor-

ings have the goal of reducing the amount of dupli-

cated code and improving its reusability. Refactor-

ings may occur at different levels of the project struc-

ture: method, class, package, architecture. The effect

of these activities are important changes in the struc-

ture of the code and a reduction of its size and com-

plexity. When the refactored methods are not reused

(e.g. refactoring is done to support future reuse, or

just to simplify the algorithms), the overall size of

the artifacts is preserved (no lines of code are added,

or deleted, they are just restructured). Because of

automatic support provided by development environ-

ments, simple refactorings require less effort in com-

parison with other development activities.

Adaptive Maintenance. The efforts required to

modify software systems in order to be able to work

see http://www.junit.org for reference

ICSOFT 2006 - INTERNATIONAL CONFERENCE ON SOFTWARE AND DATA TECHNOLOGIES

212

in new environments (e.g. new operating system, new

hardware, new databases etc.) are considered to be

adaptive maintenance. We focus our research and ex-

periments on systems developed in Java, which is a

platform independent programming language. In this

context this maintenance category is expected encom-

pass insigniﬁcant amounts of efforts, and it is out of

the scope of this paper’s work.

Source code comprehension (SCC). Source code

comprehension is a software engineering and mainte-

nance activity necessary to facilitate reuse, inspection,

maintenance, reverse engineering, reengineering, mi-

gration, and extension of existing software systems

. Typical for this activity is the fact that the pro-

grammers spend time just for visualizing source code,

without making any change into it. The efforts in-

vested in these activities need to be redistributed over

the other maintenance categories. A simple solution

for this problem is the distribution of these efforts ac-

cording to the proportion of each maintenance cate-

gory.

3.2 Classiﬁcation Methodology

When searching for a robust classiﬁer for mainte-

nance efforts, we evaluated the performance of sev-

eral models based on domain knowledge, induced de-

cision rules and probabilistic models. The classiﬁca-

tion itself is done by analyzing the time evolution of a

set of software metrics: Chidamber-Kemerer metrics,

Halsted’s metrics and McCabe’s cyclomatic complex-

ity. Additional two boolean variables, as well as the

collected efforts themselves complete the list of clas-

siﬁers’ input. The boolean variables represent the re-

sults of the tests indicating whether a given code frag-

ment is part of a test class (TC), or whether it was

created as a result of source code restructuring (OA).

The OA test is based on the concept proposed by God-

frey et al. (Godfrey and Zou, 2005) and on other ap-

proaches that aim at identifying structural changes in

source code based on software metrics (Kontogiannis,

1997; Germain and Robillard, 2005). A detailed de-

scription of the other above mentioned metrics can be

found in (Norman Fenton, 1997).

3.2.1 Knowledge-based Approach

The knowledge-based approach captures the domain

heuristics in the classiﬁcation table (see Table 1) and

transforms into a decision rules representation (see

Table 2). The given heuristics represent rules of

thumb such as: If high amount of effort is spent for

heavily changing the structure of a code fragment

without increasing its size and without reusing code

from other parts of the product, then the effort should

see www.program-comprehension.org

be classiﬁed as corrective maintenance (compare the

row marked with an asterisk in Table 1). In fact

the heuristics formalize the discussion of the differ-

ent maintenance categories in the previous sections.

The metrics in classiﬁcation model are grouped

into two classes. The STR group indicates structural

changes and uses the following metrics : Number of

methods, Number of classes and Depth of inheritance

tree. Furthermore, the size and complexity metrics

are grouped in the variable SIZE including: Lines of

code, Response for a class, Fan-out, Cyclomatic com-

plexity, as well as Halstead’s volume. EFF stands for

the time effort associated with a given activity, while

OA and TC signify decision variables on origin anal-

ysis and test classes.

Table 1: Classiﬁcation Table.

STR SIZE EFF TC Category

0 0 1 0 SCC

0 0 0 1 P rM

0 1 0 0 CM

0 1 0 1 P rM

0 1 1 0 CM

0 1 1 1 P rM

1 0 0 0 P rM

1 0 0 1 P rM

1 0 1, OA=0 0 CM*

1 0 1, OA=0 1 P rM

1 0 1, OA=1 0 P rM

1 0 1, OA=1 1 P rM

1 1 1 0 P eM

1 1 1 1 P rM

0 1 0 0 P rM

0 1 0 1 P rM

0 1 0 0 P rM

0 1 0 1 P rM

For TC the value 1 signiﬁes that the maintenance

activity that is currently analyzed is related to the

implementation/modiﬁcation of test methods. OA

equals 1 indicates that - within the course of the

given activity - code fragments were extracted from

the body of other methods. Small values of time

effort (EFF) are marked with 0, while higher ones

are marked with 1. Changes in the source code that

increase its size (SIZE) or its structural complexity

(STR) are indicated with the value 1 in the corre-

sponding columns, while the value 0 means no change

or a decrease of related metrics values. We inferred

the following classiﬁcation rules by using the Matlab

statistical toolbox

(Zanker and Gordea, 2006):

When analyzing the extracted categorization rules,

we can observe that all efforts related to refactoring or

correcting of test classes are classiﬁed with P rM

See http://www.mathworks.com for reference

BUILDING MAINTENANCE CHARTS AND EARLY WARNING ABOUT SCHEDULING PROBLEMS IN

SOFTWARE PROJECTS

213

Table 2: Classiﬁcation rules.

P rM

= T C

P eM = ¬T C

ST R

SIZE

CSS = ¬TC

¬ST R

¬SIZE

P rM

= (¬T C

ST R

¬SIZE

¬EF F )

CM = (¬T C

¬ST R

SIZE)

(¬T C

ST R

¬SIZE

EF F

¬OA)

instead of CM or PrM

. This deﬁnition is consis-

tent with developers’ view, that considers only mod-

iﬁcations of source code that implements a system

behavior as perfective or corrective maintenance. A

more detailed discussion regarding the expert heuris-

tics basing on concrete source code examples is pre-

sented in (Zanker and Gordea, 2006).

3.2.2 Machine Learning Approaches

Machine learning algorithms are widely used for ex-

tracting knowledge out of empirically collected data

sets. The most popular algorithms are based on deci-

sion trees or decision rules as well as on probabilistic

models or neural networks.

Decision Rules. Decision trees, decision tables

and decision rules are related knowledge represen-

tation technologies. Decision trees are classiﬁcation

schemes that consist of a set of subsequent boolean

tests that end up with leafs indicating the item’s cat-

egory. All paths in the tree starting with the root and

ending with one of the leafs can be expressed in the

form of ”IF (condition) THEN category” rules, where

a condition is a conjunction of tests in the path. This is

in fact the decision table representation of the decision

tree. The description for a class can be described as a

disjunction of all rules in a decision table identifying

the given category, whose representation is generally

known as disjunctive normal form, or decision rule.

Basically, there are two approaches for learning de-

cision rules from a given data set. The top-down ap-

proach is also used for learning decision trees, and

consists of an algorithm that recursively splits the data

set until all sets contain elements belonging to only

one category. The bottom-up rule induction approach

is a two step algorithm. In the initial phase a deci-

sion table is constructed by collecting all individual

instances from a data set. The second step of the algo-

rithm builds generalized rules written in a more com-

pact form by heuristically searching for the single best

rule for each class that covers all its cases. A good

comparison of available algorithms used for learn-

ing decision trees and decision rules can be found in

(Apte and Weiss, 1997).

Bayesian Networks. Bayesian Networks, also

known under the names of causal or belief networks

are directed acyclic graphs (DAG) used to represent

probability distributions in a graphical manner. Each

node in the Bayesian network represents a probability

variable, and has an associated probability distribu-

tion table used to compute class probabilities for any

given instance (see Figure 1). An edge between two

nodes of the network represents the direct inﬂuence of

a variable representing the parent node to the acces-

sor’s node variable. If there is no edge between two

nodes of the network, their variables are considered

to be conditionally independent (P(A/B)=1).

Figure 1: Sample Bayesian Network.

The computation of class probabilities is based on

the Bayesian theorem:

P (B/A) =

P (A, B)

P (A)

P (A/B) ∗ P (B)

P (A)

(1)

where P(A) and P(B) are the probabilities of event A,

and B respectively, and P(B/A), P(A/B) are the condi-

tional probabilities, of event B given A, and of event

A given B, respectively.

Given the fact that Bayesian networks are acyclic

graphs, they can be ordered such that for each node all

of its accessors get a smaller index. In this case, con-

sidering the conditional independence assumption be-

tween the parents and the accessors of network nodes,

the chain rule in the probability theory can be repre-

sented as (H.Witten and Frank, 2000):

P (a

, a

, ..., a

) =

i=1

P [a

i−1

, ..., a

] (2)

where a

are networks nodes.

Learning and selecting the best Bayesian classi-

ﬁer from labeled data sets is a challenging problem.

Many different approaches were proposed, most of

them exploiting particularities of the Bayesian net-

work and optimizing the learned models for partic-

ular probability distributions. A general and robust

algorithm based on the minimum description length

(MDL) principle is presented by Lam & Bacchus in

(Lam and Bacchus, 1994).

ICSOFT 2006 - INTERNATIONAL CONFERENCE ON SOFTWARE AND DATA TECHNOLOGIES

214

3.3 Maintenance Charts

Building maintenance charts. The development ef-

forts are not the only indicator of problems occurring

in software projects, but all these problems will be re-

ﬂected in maintenance charts generating instabilities

or out-of control situations. The statistical process

control theory deﬁnes well established algorithms for

building different types of control charts (range, av-

erage, etc.). The control limits are computed basing

on previously collected data and using the concept of

three sigma allowed variance. These charts can be

built only after some amounts of empirical data are

collected, and they are static models that will need to

be changed in different phases of development pro-

cess. Therefore we propose a model for building

maintenance charts basing on initial effort estimations

that will deﬁne the center line of each chart (CL). The

upper control limit (UCL) and the lower control limit

(LCL) are computed using the risk interval taken into

consideration in project planning.

In Figure 2 we present an example of a main-

tenance chart that monitors and controls the evolu-

tion in time of perfective, corrective and preventive

efforts. The maintenance efforts are not uniformly

distributed over the whole development period of a

new release. In the initial phase (Phase I) impor-

tant amounts of efforts are allocated for designing

new modules and for correcting defects of the last re-

lease (corrective maintenance), activities that are usu-

ally associated with refactorings and unit testing ac-

tivities (preventive maintenance). In this phase, fea-

ture implementation activities postponed from previ-

ous releases are implemented too. In the second phase

(Phase II) the most efforts are allocated for imple-

menting new functionality into the system (perfec-

tive maintenance), while the last period before release

(Phase III) is reserved for testing and correcting the

found defects. In case of experienced development

teams these tasks are associated with unit testing and

refactorings. Given this distribution in time of the

maintenance efforts, the maintenance charts are cre-

ated as a combination of normal (simple average and

range control charts) and moving average charts. The

average distribution of the development efforts over

maintenance categories indicates a healthy develop-

ment process (∼65% Perfective maintenance, ∼25%

Corrective maintenance and ∼10% Preventive main-

tenance).

Warning about development and scheduling prob-

lems. Four tests that are effective in detecting lo-

cal unusual patterns in control charts are presented in

(Florac and Carleton, 1999). These tests analyze the

distribution of successive points in the control charts

over the three sigma interval around the center line.

They are used to identify if the process runs out of

control (the variables overpass the control limit) or

Figure 2: Maintenance charts.

when the system looses its stability or calibration (the

variables doesn’t have a random variation around the

center line). We adopted two of these tests that to-

gether with trend analysis are able to uncover pro-

cess instabilities and warn about impending schedul-

ing problems. The ﬁrst test checks the existence of

four or more points on the same side of the center

line. A positive result of this test shows process insta-

bility and warns that the process may soon run out of

control (see situation 1 in Figure 2).

The second test identiﬁes the cases when the pro-

cesses are out of control like in situation 2 of Fig-

ure 2 when the ﬁrst point that overpasses the control

limits is found. Apparently, less efforts invested in

preventive actions are not an indicator of scheduling

overruns since the planned functionality is still im-

plemented into the system. Anyway, in this situation

the managers must be aware that the last implemented

source code was not enough tested and its quality was

not veriﬁed. In other words, this source code may be

buggy and software quality problems may occur in the

near future.

In the third case (situation 3) the process runs com-

pletely out of control. The trend analysis shows a con-

stant increase of corrective and a decrease of perfec-

tive maintenance efforts. Because of the deterioration

of the source code quality implemented in the last pe-

riod of time, it is harder to implement new function-

ality and more defects need to be corrected. In or-

der to be able to make the release at the planned date

it is absolutely mandatory to make corrections in the

schedule. In order to bring the project back on track,

the manager may decide to postpone the implementa-

tion of some system features for the next release and

to reallocate these resources for improving the qual-

BUILDING MAINTENANCE CHARTS AND EARLY WARNING ABOUT SCHEDULING PROBLEMS IN

SOFTWARE PROJECTS

215

Figure 3: Classiﬁcation accuracy.

ity of the source code and for correcting more system

defects.

4 EVALUATION OF

CLASSIFICATION MODELS

The purpose of our evaluation was to compare the

classiﬁcation performance of the presented tech-

niques. For the empirical evaluation we collected time

efforts and software metrics from a student project

over one calendar month. During the evaluation pe-

riod the students were implementing their graduation

project having the size about several tens of thousands

lines of code. At the end of each day the students were

asked to manually classify the collected efforts into

the corresponding maintenance categories. This in-

formation was collected into a database consisting of

2155 events. Each event registered the entity that was

edited, the date and its time effort, as well as the man-

ually inserted maintenance classiﬁcation of the devel-

opers.

Due to daily annotation of the experimental data set

by developers, we assume the manual classiﬁcation

to be correct. Now we evaluate the accuracy of the

expert’s set of heuristics and the learned classiﬁcation

models on the data set.

We compare expert heuristics (expert) with the

three learning techniques (Bayes Net and induced de-

cision rules). The classiﬁcation accuracy was de-

termined by cross-validating on 50% of the data

set. Using the developers classiﬁcation as relevance

set, the classiﬁcation performance of each algorithm

was evaluated using the precision and recall metrics,

which are the standard evaluation metrics used in in-

formation retrieval. Precision is deﬁned as the ratio

of the number of relevant records retrieved to the total

number of irrelevant and relevant records retrieved.

In our case, given the maintenance category X, the

precision measures the ratio of events identically cat-

egorized by classiﬁcation algorithm and software de-

velopers as belonging to category X from the total

number of records selected by classiﬁcation algorithm

into category X. Similar to this, the recall is the ratio

of events identically categorized by classiﬁcation al-

gorithm and software developers as belonging to cat-

egory X, out of the total number of records classiﬁed

by developers into category X. These two measures

are inversely related and the best classiﬁcation algo-

rithms are those that present highest values for both

metrics.

As can be seen in Figure 3 the expert model and

learned decision rules provide the best results. Due

to their classiﬁcation rule that is realized with a sin-

gle variable (TC), the P RM

efforts are identiﬁed

with 100 % accuracy by these two algorithms. Con-

trastingly, the probabilistic model identiﬁes P R M

with high precision but introduces false positives (re-

call < 1). Perfective maintenance can also be pre-

dicted with a good precision (> 0.87) by all three

models, but the expert heuristics are the only algo-

rithm that do not introduce many false positives (i.e.

also high recall). All algorithms have problems to cor-

rectly predict the source code comprehension activi-

ties (about 60-70% precision) and the machine learn-

ing models have problems to classify corrective main-

tenance, too. With a precision around 85% and a re-

call of about 77%, the expert model classiﬁes correc-

tive maintenance efforts with a reasonable accuracy.

Concluding, the expert model provides the highest

prediction accuracy and is able to correctly classify

about 83% of all events. The prediction accuracy re-

mains stable over time. Using absolute effort numbers

about 86% of total effort has been correctly classiﬁed.

5 CONCLUSIONS

Being able to deliver product releases at the planned

deadlines is extremely important in software indus-

try, especially for companies that work under con-

ICSOFT 2006 - INTERNATIONAL CONFERENCE ON SOFTWARE AND DATA TECHNOLOGIES

216

tract. Monitoring the progress and keeping the devel-

opment process under control ensures the success of

a project. However, there are many sources that pro-

duce development and scheduling problems in soft-

ware projects. In this paper we presented an approach

for warning about development and scheduling prob-

lems based on maintenance charts. Three types of

tests inspired from statistical process control theory

are used to identify events indicating instabilities or

processes that get out from statistical control. An

experiment evaluating the performance of different

models used for classifying efforts into maintenance

categories is presented. For this experiment we used

an empirical data set collected from the development

of a student project. The evaluation showed that a

classiﬁer based on expert heuristics outperformed ma-

chine learning algorithms due to a higher stability ver-

sus false leads and noise. Future work will focus on

the implementation of the presented concepts for as-

sessing the management of commercial projects and

further experiences can be acquired.

REFERENCES

Apte, C. and Weiss, S. (1997). Data mining with decision

trees and decision rules. Future Gener. Comput. Syst.,

13(2-3):197–210.

Brown, M. and Goldenson, D. (2004). Measurement anal-

ysis: What can and does go wrong? In METRICS’04

Proceedings, pages 131–138.

Florac, W. A. and Carleton, A. D. (1999). Measuring the

software process: statistical process control for soft-

ware process improvement. Addison-Wesley Long-

man Publishing Co., Inc., Boston, MA, USA.

Fowler, M. (1999). Refactoring - Improving the Design of

Existing Code. Addison-Wesley Publishing Company.

Germain, E. and Robillard, P. N. (2005). Activity patterns

of pair programming. The Journal of System and Soft-

ware, pages 17–27.

Godfrey, M. and Zou, L. (2005). Using origin analysis to

detect merging and splitting of source code entities.

IEEE Transactions on Software Engineering, 31(2).

Goethert, W. and Hayes, W. (2001). Experiences in im-

plementing measurement programs. Technical Report

2001-TN-026, Carnegie Mellon University/Software

Engineering Institute (SEI).

Graves, T. L. and Mockus, A. (1998). Inferring change ef-

fort from conﬁguration management data. In Metrics

98: Fifth International Symposium on Software Met-

rics, pages 267–273, Bethesda, Maryland.

H.Witten, I. and Frank, E. (2000). Data Mining, Prac-

tial Machine Learning Tools and Techniques with Java

Implementations. Morgan Kaufmann, USA.

Johnson, P., Kou, H., Agustin, J., Chan, C., Miglani, C.

M. J., Zhen, S., and Doane, W. (2003). Beyond

the personal software process: Metrics collection and

analysis for the differently disciplined. In ICSE ’03

proceedings.

Kemerer, C. F. and Slaughter, S. (1999). An empirical ap-

proach to studying software evolution. IEEE Transac-

tions on Software Engineering, 25(4):493–509.

Khosgoftaar, T. M., Nguyen, L., Gao, K., and Rajeeval-

ochanam, J. (2003). Application of an attribute se-

lection method to cbr-based software quality classiﬁ-

cation. In ICTAI 2003 proceedings, pages 47–52.

Kontogiannis, K. (1997). Evaluation experiments on the de-

tection of programming patterns using software met-

rics. In WCRE ’97 proceedings.

Lam, W. and Bacchus, F. (1994). Learning bayesian belief

networks: An approach based on the mdl principle.

Lee, M.-G. and Jefferson, T. L. (2005). An empirical study

of software maintenance of a web-based java applica-

tion. In 21st IEEE International Conference on Soft-

ware Maintenance Proceedings (ICSM’05).

Lientz, B. P. and Swanson, E. B. (1980). Software Mainte-

nance Management. Addison-Wesley Longman Pub-

lishing Co., Inc., Boston, MA, USA.

Liu, X. F., Kane, G., and Bambroo, M. (2003). An intel-

ligent early warning system for software quality im-

provement and project management. In ICTAI 2003

proceedings, pages 32–38.

Norman Fenton, S. L. P. (1997). Software Metrics: a rig-

urous and practical approach (second edition). PWS

Publishing Company.

Reformat, M. and Wu, V. (2003). Analysis of software

maintenance data using multi-technique approach. In

ICTAI 2003 proceedings, pages 53–60.

Seacord, R. C., Plakosh, D., and Lewis, G. A. (2003). Mod-

ernizing Legacy Systems: Software Technologies, En-

gineering Processes, and Business Practices. P Addi-

son Wesley Professional.

Sillitti, A., Janes, A., Succi, G., and Vernazza, T. (2003).

Collecting, integrating and analyzing software metrics

and personal software process data. In EUROMICRO

2003, pages 336–342.

Thwin, M. M. T. and Quah, T.-S. (2005). Application of

neural networks for software quality prediction using

object-oriented metrics. Journal of Systems and Soft-

ware, 76(2):147–156.

Vliet, H. V. (2000). Software engineering: principles and

practice. John Wiley.

Zanker, M. and Gordea, S. (2006). Measuring, monitoring

and controlling software maintenance efforts. Time

2006, International Symposium on Temporal Repre-

sentation and Reasoning, 0:103–110.

Zhang, D. and Tsai, J. J. P. (2003). Machine learning

and software engineering. Software Quality Control,

11(2):87–119.

BUILDING MAINTENANCE CHARTS AND EARLY WARNING ABOUT SCHEDULING PROBLEMS IN

SOFTWARE PROJECTS

217