SEMI-AUTOMATIC ASSIGNMENT OF WORK ITEMS

Jonas Helming, Holger Arndt, Zardosht Hodaie, Maximilian Koegel and Nitesh Narayan

Institut für Informatik,Technische Universität München, Garching, Germany

Keywords: Machine Learning, Task Assignment, Bug Report, UNICASE, Unified Model, UJP.

Abstract: Many software development projects maintain repositories managing work items such as bug reports or

tasks. In open-source projects, these repositories are accessible for end-users or clients, allowing them to

enter new work items. These artifacts have to be further triaged. The most important step is the initial

assignment of a work item to a responsible developer. As a consequence, a number of approaches exist to

semi-automatically assign bug reports, e.g. using methods from machine learning. We compare different

approaches to assign new work items to developers mining textual content as well as structural information.

Furthermore we propose a novel model-based approach, which also considers relations from work items to

the system specification for the assignment. The approaches are applied to different types of work items,

including bug reports and tasks. To evaluate our approaches we mine the model repository of three different

projects. We also included history data to determine how well they work in different states.

1 INTRODUCTION

Many software development projects make use of

repositories, managing different types of work items.

This includes bug tracker systems like Bugzilla ,

task repositories like Jira and integrated solutions

such as Jazz or the Team Foundation Server . A

commonality of all these repositories is the

possibility to assign a certain work item to a

responsible person or team (Anvik 2006).

It is a trend in current software development to open

these repositories to other groups beside the project

management allowing them to enter new work

items. These groups could be end-users of the

system, clients or the developers themselves. This

possibility of feedback helps to identity relevant

features and improves the quality by allowing more

bugs to be identified (Raymond 1999). But this

advantage comes with significant cost (Anvik et al.

2006), because every new work item has to be

triaged. That means it has to be decided whether the

work item is important or maybe a duplicate and

further, whom it should be assigned to. As a part of

the triage process it would be beneficial to support

the assignment of work items and automatically

select those developers with experience in the area

of this work item. This developer is probably a good

candidate to work on the work item, or, if the

developer will not complete the work item himself,

he probably has the experience to further triage the

work item and reassign it. There are several

approaches, which semi-automatically assign work

items (mostly bug reports) to developers. They are

based on mining existing work items of a repository.

We will present an overview of existing approaches

in section 2.1.

In this paper we compare different existing

techniques of machine learning and propose a new

model-based approach to semi-automatically assign

work items. All approaches are applied to a unified

model, implemented in a tool called UNICASE

(Bruegge et al. 2008). The unified model is a

repository for all different types of work items.

Existing approaches usually focus on one type of

work item, for example bug reports. The use of a

unified model enables us to apply and evaluate our

approach with different types of work items,

including bug reports, feature requests, issues and

tasks. We will describe UNICASE more in detail in

section 3.

UNICASE does not only contain different types of

work items, but also artifacts from the system

specification, i.e. the system model (Helming et al.

2009). Work items can be linked to these artifacts

from the system specification as illustrated in Figure

1. For example a task or a bug report can be linked

to a related functional requirement. These links

provide additional information about the context of a

149

Helming J., Arndt H., Hodaie Z., Koegel M. and Narayan N. (2010).

SEMI-AUTOMATIC ASSIGNMENT OF WORK ITEMS.

In Proceedings of the Fifth International Conference on Evaluation of Novel Approaches to Software Engineering, pages 149-158

DOI: 10.5220/0003000901490158

 SciTePress

work item, which can be mined for semi-automatic

assignment, as we will show in section 4. Our new

approach for semi-automatic task assignment, called

model-based approach, processes this information.

The results of this approach can be transferred to

other systems such as bug trackers where bug

reports can be linked to affected components.

We found that existing approaches are usually

evaluated in a certain project state (state-based),

which means that a snap shot of the project is taken

at a certain time and all work items have a fixed

state. Then the assigned work items are classified by

the approach to be evaluated and the results are

compared with the actual assignee at that project

state. We use this type of evaluation in a first step.

However, state-based evaluation has two

shortcomings: (1) The approach usually gets more

information than it would have had at the time a

certain work item was triaged. For example,

additional information could have been attached to a

work-item, which was not available for initial triage.

(2) No conclusion can be made, how different

approaches work in different states of a project, for

example depending on the number of work items or

on personal fluctuations. Therefore we evaluated our

method also “history-based” which means that we

mine all states of the project history and make

automatic assignment proposals in the exact instance

when a new work item was created. We claim that

this type of evaluation is more realistic than just

using one later state where possible more

information is available. We evaluate our approach

by mining data from three different projects, which

use UNICASE as a repository for their work items

and system model. To evaluate which approach

works best in our context as well as for a

comparison of the proposed model-based approach

we apply different machine learning techniques to

assign work items automatically. These include very

simple methods such as nearest neighbor, but also

more advanced methods such as support vector

machines or naive Bayes.

The paper is organized as follows: Section 2

summarizes related work in the field of automated

task assignment as well as in the field of

classification of software engineering artifacts.

Section 3 introduces the prerequisites, i.e. the

underlying model of work items and UNICASE, the

tool this model is implemented in. Section 4 and 5

describe the model-based and the different machine

learning approaches we applied in our evaluation.

Section 6 presents the results of our evaluation on

the three projects, in both a state-based and a

history-based mode. In section 7 we conclude and

discuss our results.

2 RELATED WORK

In this section we give an overview over relevant

existing approaches. In section 2.1 we describe

approaches, which semi-automatically assign

different types of work items. In section 2.2 we

describe approaches, which classify software

engineering artifacts using methods from machine

learning and which are therefore also relevant for

our approach.

2.1 Task Assignment

In our approach we refer to task assignment as the

problem of classifying work items to the right

developer. Determining developer expertise is the

basis for the first part of our approach. In our case

this is done by mining structured project history data

saved within the UNICASE repository.

Most of the approaches for determining expertise

rely on analyzing the code base of a software project

mostly with the help of version control systems.

(Mockus & Herbsleb 2002) treat every change from

a source code repository as an experience atom and

try to determine expertise of developers by counting

related changes made in particular parts of source

code. (Schuler & Zimmermann 2008) introduce the

concept of usage expertise, which describes

expertise in the sense of using source code, e.g. a

specific API. Based on an empirical study, (Fritz et

al. 2007) showed that these expertise measures

acquired from source code analysis effectively

represent parts of the code base, which the

programmer has knowledge for. (Sindhgatta 2008)

uses linguistic information found in source code

elements such as identifiers and comments to

determine the domain expertise of developers.

Other task classification approaches use information

retrieval techniques such as text categorization to

find similar tasks. Canfora et al. (Canfora & Cerulo

o 2005) demonstrate how information retrieval on

software repositories can be used to create an index

of developers for the assignment of change requests.

(J. Anvik 2006) investigate applying different

machine learning algorithms to an open bug

repository and compare precision of resulting task

assignments. (Anvik et al. 2006) apply SVM text

categorization on an open bug repository for

ENASE 2010 - International Conference on Evaluation of Novel Approaches to Software Engineering

150

classifying new bug reports. They achieve high

precision on the Eclipse and Firefox development

projects and found their approach promising for

further research. (Čubranić 2004) employ text

categorization using a naive Bayes classifier to

automatically assign bug reports to developers. They

correctly predict 30% of the assignments on a

collection of 15,859 bug reports from a large open-

source project. Yingbo et al. (Yingbo et al. 2007)

apply a machine learning algorithm to workflow

event log of a workflow system to learn the different

activities of each actor and to suggest an appropriate

actor to assign new tasks to.

2.2 Artifact Classification

Machine learning provides a number of

classification methods, which can be used to

categorize different items and which can also be

applied to software artifacts. Each item is

characterized by a number of attributes, such as

name, description or due date, which have to be

converted into numerical values to be useable for

machine learning algorithms. These algorithms

require a set of labeled training data, i.e. items for

which the desired class is known (in our case the

developer who an item has been assigned to). The

labeled examples are used to train a classifier, which

is why this method is called “supervised learning”.

After the training phase, new items can be classified

automatically, which can serve as a recommendation

for task assignment. A similar method has been

employed by (Čubranić 2004) who used a naive

Bayes classifier to assign bug reports to developers.

In contrast to their work, our approach is not limited

to bug reports, but can rather handle different types

of work items. Moreover, we evaluate and compare

different classifiers. Also (Bruegge et al. 2009) have

taken a unified approach and used a modular

recurrent neural network to classify status and

activity of work items.

3 PREREQUISITES

We implemented and evaluated our approach for

semi-automated task assignment in a unified model

provided by the tool UNICASE . In this section we

will describe the artifact types we consider for our

approach. Furthermore we describe the features of

these artifacts, which will form the input for the

different approaches. UNICASE provides a

repository, which can handle arbitrary types of

software engineering artifacts. These artifacts can

either be part of the system model, i.e. the

requirements model and the system specification, or

the project model, i.e. artifacts from project

management such as work items or developers

(Helming et al. 2009)

Figure 1: Excerpt from the unified model of UNICASE

(UML class diagram).

Figure 1 shows the relevant artifacts for our

approach. The most important part is the association

between work item and developer. This association

expresses, that a work item is assigned to a certain

developer and is therefore the association we semi-

automatically want to set. Work items in UNICASE

can be issues, tasks or bug reports. As we apply our

approach to the generalization work item it is not

limited to one of the subtypes as in existing

approaches. As we proposed in previous work (J.

Helming et al. 2009), work items in UNICASE can

be linked to the related Functional Requirements

modeled by the association isObjectOf. This

expresses that the represented work of the work item

is necessary to fulfill the requirement. This

association, if already existent adds additional

context information to a work item. Modeled by the

Refines association, Functional requirements are

structured in a hierarchy. We navigate this hierarchy

in our model-based approach to find the most

experienced developer, described in section 4. As a

first step in this approach, we have to determine all

related functional requirements of the currently

inspected work item. As a consequence this

approach only works for work items, which are

linked to functional requirements.

While the model-based approach of semi-automated

task assignment only relies on model links in

UNICASE, the machine learning approaches mainly

rely on the content of the artifacts. All content is

stored in attributes. The following table provides an

overview of the relevant features we used to

evaluate the different approaches:

SEMI-AUTOMATIC ASSIGNMENT OF WORK ITEMS

151

Feature Meaning

Name

A short and unique name for the

represented work item.

Description A detailed description of the work item.

ObjectOf

The object of the work item, usually a

Functional Requirement.

We will show in the evaluation section, which

features had a significant impact on the accuracy of

the approach.

UNICASE provides an operation-based versioning

for all artifacts (Koegel 2008). This means all past

project-states can be restored. Further we can

retrieve a list of operations for each state, for

example when a project manager assigned a work

item to a certain developer. We will use this

versioning system in the second part of our

evaluation to exactly recreate a project state where a

work item was created. The goal is to evaluate

whether our approach would have chosen the same

developer for an assignment as the project manager

did. This evaluation method provides a more

realistic result than evaluating the approaches only

on the latest project state. With this method both

approaches, machine learning and model-based, can

only mine the information, which was present at the

time of the required assignment recommendation.

4 MODEL-BASED APPROACH

For the model-based assignment of work items we

use the structural information available in the unified

model of UNICASE. In UNICASE every functional

requirement can have a set of linked work items.

These are work items that need to be completed in

order to fulfil this requirement.

The main idea of our model-based approach is to

find the relevant part of the system for the input

work item. In a second set we extract a set of

existing work items, which are dealing with this part

of the system. For a given input work item and based

on this set we select a potential assignee. We will

describe how this set is created using an example in

Figure 2.

The input work item W is linked to the functional

requirement B. To create the relevant set of work

items (RelevantWorkItems(W)) we first add all

work items, which are linked to functional

requirement B (none in this example). Furthermore

we add all work items linked to the refined

functional requirement (A) and all work items linked

to the refining requirements (C). In the example the

set would consist of the work items 1 and 2.

Futhermore, we recursively collect all work items

from the refiningRequirements of A, which are

neighbors of functional requirement B in the

hierarchy (not shown in the example).

Using the set RelevantWorkItems(W) we determine

expertise of each developer D regarding W

(Expertise

(D)). We defined Expertise

(D) as the

number of relevant work items this developer has

already completed. After determining Expertise

(D)

for all developers, the one with highest expertise

value is suggested as the appropriate assignee of the

work item W.

Figure 2: Example for the model-based approach (UML

object diagram).

5 MACHINE LEARNING

APPROACHES

We have used the Universal Java Matrix Library

(UJMP) (Arndt et al. 2009) to convert data from

UNICASE into a format suitable for machine

learning algorithms. This matrix library can process

numerical as well as textual data and can be easily

integrated into other projects. All work items are

aggregated into a two-dimensional matrix, where

each row represents a single work item and the

columns contain the attributes (name, description,

ObjectOf association). Punctuation and stop words

are removed and all strings are converted to

lowercase characters. After that, the data is

converted into a document-term matrix, where each

row still represents a work item, while the columns

contain information about the occurrence of terms in

this work item. There are as many columns as

different words in the whole text corpus of all work

items. For every term, the number of occurrences in

this work item is counted. This matrix is normalized

using tf-idf (term frequency / inverse document

frequency)

i, j

k, j

∑

ENASE 2010 - International Conference on Evaluation of Novel Approaches to Software Engineering

152

,where n

i,j

is the number of occurrences of the term t

in document d

, and the denominator is the sum of

occurrences of all terms in document d

The inverse document frequency is a measure of the

general importance of the term: logarithm of total

number of documents in the corpus divided by

number of documents where the term t

appears. A

deeper introduction to text categorization can be

found in (Sebastiani 2002).

We have not used further preprocessing such as

stemming or latent semantic indexing (LSI) as our

initial experiments suggested, that it had only a

minor effect on performance compared to the

selection of algorithm or features. We have used the

tf-idf matrix as input data to the Java Data Mining

Package (JDMP) (Arndt 2009), which provides an

common interface to numerous machine learning

algorithms from various libraries. Therefore we were

able to give a comparison between different

methods:

Constant Classifier

The work items are not assigned to all developers on

an equal basis. One developer may have worked on

much more work items than another one. By just

predicting the developer with the most work items it

is possible to make many correct assignments.

Therefore we use this classifier as a baseline, as it

does not consider the input features.

Nearest Neighbor Classifier

This classifier is one of the simplest classifiers in

machine learning. It uses normalized Euclidean

distance to locate the item within the training set

which is closest to the given work item, and predicts

the same class as the labeled example. We use the

implementation IB1 from Weka (Witten & Frank

2002). We did not use k-nearest neighbors, which

usually performs much better, because we found the

runtime of this algorithm to be too long for practical

application in our scenario.

Decision Trees

Decision trees are also very simple classifiers, which

break down the classification problem into a set of

simple if-then decisions which lead to the final

prediction. Since one decision tree alone is not a

very good predictor, it is a common practice to

combine a number of decision trees with ensemble

methods such as boosting (Freund & Schapire

1997). We use the implementation

RandomCommittee from Weka.

Support Vector Machine (SVM)

The support vector machine (SVM) calculates a

separating hyperplane between data points from

different classes and tries to maximize the margin

between them. We use the implementation from

LIBLINEAR (Fan et al. 2008), which works

extremely fast on large sparse data sets and is

therefore well suited for our task.

Naïve Bayes

This classifier is based on Bayes' theorem in

probability theory. It assumes that all features are

independent which is not necessarily the case for a

document-term matrix. However, it scales very well

to large data sets and usually yields good results

even if the independence assumption is violated. We

use the implementation NaiveBayesMultinomial

from Weka (Witten & Frank 2002) but also

considered the implementation in MALLET , which

showed lower classification accuracy (therefore we

only report results from Weka).

Neural Networks

Neural networks can learn non-linear mappings

between input and output data, which can be used to

classify items into different classes (for an

introduction to neural networks see e.g. (Haykin

2008)). We have tried different implementations but

found that the time for training took an order of

magnitudes longer than for the other approaches

considered here. Therefore we were unable to

include neural networks into our evaluation.

For the state-bases evaluation, we trained these

classifiers using a cross validation scheme: The data

has been split randomly into ten subsets. Nine of

these sets were selected to train the classifier and

one to assess its performance. After that, another set

was selected for prediction, and the training has been

performed using the remaining nine sets. This

procedure has been performed ten times for all sets

and has been repeated ten times (10 times 10-fold

cross validation). For the history-based evaluation,

the classifiers were trained on the data available at a

certain project state to predict the assignee for a

newly created work item. After the actual

assignment through the project leader, the classifiers

were re-trained and the next prediction could be

made.

Depending on the approach, runtime for the

evaluation of one classifier on one project ranged

from a couple of minutes for LIBLINEAR SVM to

almost two days for the nearest neighbor classifier.

Although a thorough comparison of all machine

idf

= log

d : t

∈ d

{}

SEMI-AUTOMATIC ASSIGNMENT OF WORK ITEMS

153

learning methods would certainly have been

interesting, we did not include a full evaluation on

all projects and performed feature selection using

LIBLINEAR, which was the fastest method of all.

We argue that an algorithm for automatic task

assigment would have to deliver a good accuracy but

at the same time the necessary performance in terms

of computing time to be useable in a productive

environment. Therefore we could also discarded the

classifiers nearest neighbour and random committee

for the complete evaluation and report results only

for UNICASE.

6 EVALUATION

In this section we evaluate and compare the different

approaches of semi-automated task assignment. We

evaluated the approaches using three different

projects. All projects have used UNICASE to

manage their work items as well as their system

documentation. In section 6.1 we introduce the three

projects and their specific characteristics. In section

6.2 we evaluate the approaches „state-based“. This

means we took the last available project state and

tried to classify all assignments post-mortem. This

evaluation technique was also used in approaches

such as (Canfora , 2005). Based on the results of the

state-based evaluation we selected the best-working

configurations and approaches and evaluated them

history-based. We stepped through the operation-

based history of the evaluation projects to the instant

before an assignment was done. This state is not

necessarily a revision from the history but can be a

state in between two revisions. This is why we had

to rely on the operation-based versioning of

UNICASE for this purpose. On the given state we

tried to predict this specific assignment post-mortem

and compared the result with the assignment, which

was actually done by the user. We claim this

evaluation to be more realistic than the state-based

as it measures the accuracy of the approach as if it

had been used in practice during the project.

Furthermore it shows how the approaches perform in

different states of the project depending on the

different size of existing data. As a general measure

to assess performance we used the accuracy, i.e. the

number of correctly classified developers divided by

the total number of classified work items. This

measure has the advantage of being very intuitive

and easily comparable between different approaches

and data sets. Other common measures such as

precision or sensitivity are strongly dependent on the

number of classes (number of developers) and their

distribution and therefore would make it more

difficult to interpret the results for our three projects.

6.1 Evaluation Projects

We have used three different projects as datasets for

our evaluation. As a first dataset we used the

repository of the UNICASE project itself, which has

been hosted on UNICASE for nearly one year. The

second project, DOLLI 2, was a large student project

with an industrial partner and 26 participants over 6

month. The goal of DOLLI was the development of

innovative solutions for facility management. The

third application is an industrial application of

UNICASE for the development of the browser game

"Kings Tale" by Beople GmbH, where UNICASE

has been used for over 6 months now. The following

table shows the number of participants and relevant

work items per project.

Table 1: Developer and work items per project.

UNICASE DOLLI Kings Tale

Developers 39 26 6

Assigne

work items

1191 411 256

Linke

work items

290 203 97

6.2 State-based Evaluation

For the state-based evaluation we used the last

existing project state. Based on this state we try to

classify all existing work items and compare the

result with the actually assigned person. In a first

step (section 6.2.1) we evaluate the machine learning

approaches. In a second step we evaluate the model-

based approach.

6.2.1 Machine Learning Approaches

We have chosen different combinations of features

as input of the application and applied the machine

learning approaches described in section 4 as well as

the model-based approach described in section 5.

Our goal was to determine the approaches,

configurations and feature-sets, which lead to the

best results and re-evaluate those in the history-

based evaluation (section 6.3). We started to

compare different feature sets. As we expected the

name of a work item to contain the most relevant

information, we started the evaluation with this

feature only. In a second and third run, we added the

ENASE 2010 - International Conference on Evaluation of Novel Approaches to Software Engineering

154

attribute description and the association ObjectOf.

The size of the tf-idf matrix varied depending on the

project and the number of selected features, e.g. for

the UNICASE project, from 1,408 columns with

only name considered to 4,950 columns with all

possible features.

Table 2 shows the results of different feature sets

for the support vector machine. In all evaluation

projects, the addition of the features description and

ObjectOf increased accuracy. The combination of all

three attributes leads to the best results. As a

conclusion we will use the complete feature set for

further evaluation and comparison with other

approaches.

Table 2: Different sets of features as input data.

Name

UNICASE DOLLI Kin

s Tale

SVM

36.5%

(±0.7%)

26.5%

(±0.7)

38.9

(±1.4)

Name and description

UNICASE DOLLI Kin

s Tale

SVM

37.1%

(±1.0)

26.9%

(±1.0)

40.7%

(±0.9)

Name, description and Ob

ectO

UNICASE DOLLI Kings Tale

SVM

38.0%

(±0.5)

28.9%

(±0.7)

43.4%

(±1.7)

In the next step we applied the described machine-

learning approaches using the best working feature

set as input (Name, description and ObjectOf). As a

base line we started with a constant classifier. This

classifier always suggests the developer for

assignment who has the most work items assigned.

As you can see in table 3, we can confirm the

findings of (Anvik et al. 2006), that SVM yields

very good results. Random Committee performed

quite badly in terms of accuracy and performance so

we did not further evaluate them on all projects. The

only competitive algorithm in terms of accuracy was

Naïve Bayes, which was however worse on the

Kings Tale project. As there was no significant

difference between SVM and Naïve Bayes we chose

SVM for further history-based evaluation due to the

much better performance.

6.2.2 Model-based Approach

In the second step of the state-based evaluation we

applied the model-based approach on the same data,

which yields in surprisingly good results (see Table

4). The first row shows the accuracy of

recommendations, when the model-based approach

could be applied. The approach is only applicable to

work items, which were linked to functional

requirements. The number of work items the

approach could be applied to is listed in Table 1. It is

worth mentioning that once we also considered the

second guess of the model-based approach and only

linked work items, we achieved accuracies of 96.2%

for the UNICASE, 78.7% for DOLLI and 94.7% for

the Kings Tale project. For a fair overall comparison

with the machine learning approaches, which are

able to classify every work item we calculate the

accuracy for all work items, including those without

links, which could consequently not be predicted.

Table 4 shows that the accuracy classifying all work

items is even worse than the constant classifier.

Therefore the model-based approach is only

applicable for linked work items or in combination

with other classifiers.

Table 3: Different machine learning approaches state-

based.

UNICASE DOLLI Kin

s Tale

Constan

19,7% 9,0% 37,4%

SVM

(LibLinear)

38.0%

(±0.5)

28.9%

(±0.7)

43.4%

(±1.7)

aïve

Bayes

39.1%

(±0.7)

29.7%

(±0.9)

37.8%

(±1.7)

Random

Committee

23.2%

(±0.2)

earest

eighbo

6.9%

(±0.1)

Table 4: Model-based approach.

UNICASE DOLLI Kin

s Tale

Linked

work items

82,6% 58,1% 78,4%

All

work items

19,9% 20,7% 29,3%

We have shown that the model-based approach can

classify linked work items based on the ObjectOf

reference. Therefore the approach basically mines,

which developer has worked on which related parts

of the system in the past (see section 5). One could

claim that the machine learning approaches could

also classify based on this information. Therefore we

applied the SVM only on linked work items with all

features and also only using the ObjectOf feature.

The results (see Table 5) show that linked work

items are better classified than non-linked. But even

a restriction to only the feature ObjectOf did not lead

to results as good as the model-based approach.

Therefore we conclude to use the model-based

SEMI-AUTOMATIC ASSIGNMENT OF WORK ITEMS

155

approach whenever it is applicable and classify all

other elements with SVM.

Table 5: Classification of linked work items.

UNICASE DOLLI Kings Tale

Constan

29,3% 18,3% 40,3%

SVM

all

features

53,9% 33,4% 50,2%

SVM

only

ectOf

49,7% 23,8% 49,2%

6.3 History-based Evaluation

In the second part of our evaluation we wanted to

simulate the actual use case of assignment. The

problem with the state-based evaluation is, that the

system has actually more information at hand, as it

would have had at the time, a work item was

assigned. Consequently we simulated the actual

assignment situation. Therefore we used the

operation-based versioning of UNICASE in

combination with an analyzer framework provided

by UNICASE. This enables us to iterate over project

states through time and exactly recreate the state

before a single assignment was done. Note that this

state must not necessarily and usually also does not

conform to a certain revision from versioning but is

rather an intermediate version between to revisions.

By using the operation-based versioning of

UNICASE we are able to recover these intermediate

states and to apply our approaches on exactly that

state. For the machine learning approach (SVM) we

trained the specific approach based on that state. For

the model-based approach we used the state to

calculate the assignment recommendation. Then, we

compared the result of the recommendation with the

assignment, which was actually chosen by the user.

For the history-based evaluation we selected the two

best working approaches from the state-based

evaluation, SVM and the model-based approach. We

applied the model-based approach only on linked

work items.

We applied SVM and the model-based approach

on the UNICASE and the DOLLI project. The

Kingsthale project did not capture operation-based

history data and was therefore not part of the

history-based evaluation. As expected the results for

all approaches are worse than in the state-based

evaluation (see Table 6). Still all applied approaches

are better than the base line, the constant classifier.

An exception is the model-based approach applied

on the DOLLI project, which shows slightly better

results in the history-based evaluation. We believe

the reason for this is that the requirements model,

i.e. the functional requirements, and the related work

items were added continuously over the project

runtime. Therefore at the states when the actual

assignment was done, the model-based approach

could calculate its recommendation based on a

smaller, but more precise set of artifacts.

Furthermore we can observe, that the results for the

UNICASE project differ largely from the state-based

evaluation compared to the DOLLI project. A

possible explanation for this is the higher personal

fluctuation in the UNICASE project. This

fluctuation requires the approaches to predict

assignments for developers with a sparse history in

the project and is therefore much more difficult. In

the state-based evaluation the fluctuation is hidden,

because the approaches can use all work items of the

specific developer no matter when he joined the

project.

Table 6: History-based (aggregated accuracy) UC=

UNICASE.

histor

state

DOLLI

histor

DOLLI

state

Const. 22% 19,7% 7% 9,0%

SVM 29% 38.0% 27% 28.9%

Model-

ase

75% 82,6% 61% 58,1%

Figure 3 and 4 show the accuracy over time for the

UNICASE project and SVM and model-based

approach, respectively. All presented charts show

two lines. The first line (black) shows the aggregated

accuracy over time. The second line (dotted black)

shows the aggregated accuracy for last 50 (DOLLI)

and 100 (UNICASE) revisions and therefore reveals

short time trends. In the selected time frame, both

Figure 3: SVM – UNICASE.

ENASE 2010 - International Conference on Evaluation of Novel Approaches to Software Engineering

156

approaches do not fluctuate significantly. This

shows, that both approaches could be applied to a

continuous project, were developers join and leave

the project.

In contrast to the continuous UNICASE, we

investigated the DOLLI project from the beginning

to the end (Figure 5 and 6) including project start-up

and shutdown activities. We observe that SVM lacks

in accuracy at the beginning, where new developers

start to work on the project. For an efficient

classification the SVM approach has to process a

reasonable set of work items per developer.

Therefore a high accuracy is only reached to the end

of the project. A closer look at the accuracy of the

model-based approach shows that it decreases at the

end of the project. Starting from around revision 430

there has been a process change in the project as

well as a reorganization of the functional

requirements. This clearly affects the results of the

model-based approach as it relies on functional

requirements and their hierarchy. In contrast to the

model-based approach, SVM seems to be quite

stable against this type of change.

Figure 4: Model-based – UNICASE.

Figure 5: SVM – DOLLI.

Figure 6: Model-based – DOLLI.

7 CONCLUSIONS

We applied machine learning techniques as well as a

novel model-based approach to semi-automatically

assign different types of work items. We evaluated

the different approaches on three existing projects.

We could confirm the results from previous authors

that the support vector machine (SVM) is an

efficient solution to this classification task. The

naïve Bayes classifier can lead to similar results, but

the implementation we have used showed a worse

performance in terms of computing time. The

model-based approach is not applicable to all work

items as it relies on structural information, which is

not always available. However it showed the best

results of all approaches whenever it was applicable.

The model-based approach relies on links from

work items to functional requirements and is

therefore not directly applicable in other scenarios

than UNICASE, where these links do not exist.

Although we believe that it can be transferred to

other systems where similar information is provided.

Bug trackers often allow to link bug reports to

related components. Components on the other hand

have relations to each other, just like the functional

requirements in our context. An obvious

shortcoming of the model-based approach is that it

requires a triage by the affected part of the system

no matter which model is used. On the one hand we

believe, that it is easier for users to triage a work

item by the affected part of the system rather than

assign it, especially if they do not know the internal

structure of a project. On the other hand if a project

decides to use both, links to related system parts and

links to assignees, the model-based approach can

help with the creation of the latter.

SEMI-AUTOMATIC ASSIGNMENT OF WORK ITEMS

157

In the second part of our evaluation, we tried to

simulate the use case in a realistic assignment

scenario. Therefore we applied the two best working

approaches over the project history and predicted

every assignment at exactly the state, when it was

originally done. As a consequence all approaches

can process less information than in the first part of

the evaluation, which was based on the last project

state. As expected the history-based evaluation leads

to lower accuracies for all approaches. The model-

based approach is less affected by this scenario than

the SVM. A possible reason for that is that the

model-based approach is not so much depending on

the size of the existing data but more on its quality.

This assumption is underlined by the behavior of the

model-based approach during massive changes in

the model, leading to lower results. In contrast to

that, the SVM was not so sensible to changes in

model, but more to fluctuations in the project

staffing.

We conclude that the best solution would be a

hybrid approach, i.e. a combination of the model-

based approach and SVM. This would lead to high

results for linked work items, but would also be able

to deal with unlinked items.

REFERENCES

Čubranić, D., 2004. Automatic bug triage using text

categorization. In SEKE 2004: Proceedings of the

Sixteenth International Conference on Software

Engineering & Knowledge Engineering. S. 92–97.

Anvik, J., 2006. Automating bug report assignment. In

Proceedings of the 28th international conference on

Software engineering. S. 940.

Anvik, J., Hiew, L. & Murphy, G.C., 2006. Who should

fix this bug? In Proceedings of the 28th international

conference on Software engineering. Shanghai,

China: ACM, S. 361-370. Available at: http://

portal.acm.org/citation.cfm?id=1134285.1134336

Arndt, H., Bundschus, M. & Naegele, A., 2009. Towards a

next-generation matrix library for Java. In COMPSAC:

International Computer Software and Applications

Conference.

Bruegge, B. u. a., 2009. Classification of tasks using

machine learning. In Proceedings of the 5th

International Conference on Predictor Models in

Software Engineering.

Bruegge, B. u. a., 2008. Unicase – an Ecosystem for

Unified Software Engineering Research Tools. In

Workshop Distributed Software Development -

Methods and Tools for Risk Management. Third IEEE

International Conference on Global Software

Engineering, ICGSE. Bangalore, India, S. 12-17.

Available at: http://www.outshore.de/Portals/0/

Outshore/ICGSE_2008_Workshop_Proceedings.pdf.

Canfora, G. & Cerulo, L., How software repositories can

help in resolving a new change request. STEP 2005,

99.

Fan, R.E. u. a., 2008. LIBLINEAR: A library for large

linear classification. The Journal of Machine Learning

Research, 9, 1871–1874.

Freund, Y. & Schapire, R.E., 1997. A decision-theoretic

generalization of on-line learning and an application to

boosting. Journal of computer and system sciences,

55(1), 119–139.

Fritz, T., Murphy, G.C. & Hill, E., 2007. Does a

programmer's activity indicate knowledge of code? In

Proceedings of the the 6th joint meeting of the

European software engineering conference and the

ACM SIGSOFT symposium on The foundations of

software engineering. S. 350.

Haykin, S., 2008. Neural networks: a comprehensive

foundation, Prentice Hall.

Helming, J. u. a., 2009. Integrating System Modeling with

Project Management–a Case Study. In International

Computer Software and Applications Conference,

COMPSAC 2009. COMPSAC 2009.

Holger Arndt, I.I., The Java Data Mining Package–A Data

Processing Library for Java.

Koegel, M., 2008. Towards software configuration

management for unified models. In Proceedings of the

2008 international workshop on Comparison and

versioning of software models. S. 19–24.

Mockus, A. & Herbsleb, J.D., 2002. Expertise browser: a

quantitative approach to identifying expertise. In

Proceedings of the 24th International Conference on

Software Engineering

. S. 503–512.

Raymond, E., 1999. The cathedral and the bazaar.

Knowledge, Technology & Policy, 12(3), 23–49.

Schuler, D. & Zimmermann, T., 2008. Mining usage

expertise from version archives. In Proceedings of the

2008 international working conference on Mining

software repositories. S. 121–124.

Sebastiani, F., 2002. Machine learning in automated text

categorization. ACM computing surveys (CSUR),

34(1), 1–47.

Sindhgatta, R., 2008. Identifying domain expertise of

developers from source code. In Proceeding of the

14th ACM SIGKDD international conference on

Knowledge discovery and data mining. S. 981–989.

Witten, I.H. & Frank, E., 2002. Data mining: practical

machine learning tools and techniques with Java

implementations. ACM SIGMOD Record, 31(1), 76–

77.

Yingbo, L., Jianmin, W. & Jiaguang, S., 2007. A machine

learning approach to semi-automating workflow staff

assignment. In Proceedings of the 2007 ACM

symposium on Applied computing. S. 345.

ENASE 2010 - International Conference on Evaluation of Novel Approaches to Software Engineering

158