Automated Selection of Software Refactorings that Improve Performance

Nikolai Moesus

, Matthias Scholze

, Sebastian Schlesinger

and Paula Herber

QMETHODS – Business & IT Consulting GmbH, Berlin, Germany

Software and Embedded Systems Engineering, Technische Universit

at Berlin, Germany

Keywords:

Software Refactoring, Performance, Static Analysis, Proﬁling.

Abstract:

Performance is a critical property of a program. While there exist refactorings that have the potential to

signiﬁcantly increase the performance of a program, it is hard to decide which refactorings effectively yield

improvements. In this paper, we present a novel approach for the automated detection and selection of refacto-

rings that are promising candidates to improve performance. Our key idea is to provide a heuristics that utilizes

software properties determined by both static code analyses and dynamic software analyses to compile a list

of concrete refactorings sorted by their assessed potential to improve performance. The expected performance

improvement of a concrete refactoring depends on two factors: the execution frequency of the respective piece

of code, and the effectiveness of the refactoring itself. To assess the latter, namely the general effectiveness of

a given set of refactorings, we have implemented a set of micro benchmarks and measured the effect of each

refactoring on computation time and memory consumption. We demonstrate the practical applicability of our

overall approach with experimental results.

1 INTRODUCTION

Performance issues are a common reason why soft-

ware needs refactoring. However, due to the vari-

ety of possible causes for performance issues, ﬁn-

ding an appropriate refactoring is a complicated task.

To tackle this problem, antipattern detection as well

as measurement-based performance engineering have

been proposed. Antipattern detection is a static ana-

lysis that aims at detecting code ﬂaws and violations

against good practice (Louridas, 2006). While an-

tipattern detection often succeeds in detecting criti-

cal code sections that cause performance bottlenecks,

it tends to yield a great number of proposed refac-

torings of which only a very small fraction can be

considered relevant for performance. To resolve all

proposed issues is therefore neither efﬁcient nor fea-

sible. Additionally, a bad performing piece of code

that gets only rarely called usually is not the cause

of severe performance problems. Measurement-based

performance engineering relies on dynamic analysis

techniques that are applied when the program un-

der development is running (Woodside et al., 2007).

Those analyses generate huge amounts of heterogene-

ous data like response times, function call durations,

stack traces, memory footprints or hardware counters.

To ﬁnd the important chunks of information that help

solving performance issues takes time and also requi-

res skill and experience. Hence, manually searching

for an appropriate refactoring is an expensive task.

In this paper, we present a novel approach for the

automated selection of refactorings that are promising

candidates to improve performance. The key idea of

our approach is to use both static and dynamic ana-

lyses and combine the results to generate a heuris-

tics that determines those refactorings that are most

promising with respect to performance. Our major

contributions are twofold: First, to quantify the ex-

pected effect of a refactoring, we present a novel ra-

ting function that incorporates the analysis data from

both static and dynamic analysis, and thus enables us

to heuristically assess the effectiveness of concrete re-

factorings. The output is a list of proposed refacto-

rings sorted by their potential to yield a strong posi-

tive effect on performance. The reasoning behind our

heuristics is to assess which portions of source code

get executed frequently, such that a refactoring there

pays off more than anywhere else. We combine this

with a factor that provides an estimate for the gene-

ral effectiveness of a given refactoring, independent

of its position in the code. Second, we present an

evaluation of the general effectiveness of a given set

of refactorings that are generally assumed to improve

performance, independent of their use in a concrete

Moesus, N., Scholze, M., Schlesinger, S. and Herber, P.

Automated Selection of Software Refactorings that Improve Performance.

DOI: 10.5220/0006837900330044

In Proceedings of the 13th International Conference on Software Technologies (ICSOFT 2018), pages 33-44

ISBN: 978-989-758-320-9

program. To achieve this, we have implemented mi-

cro benchmarks and measured their effectiveness with

respect to the execution time and memory consump-

tion. We use the results from our micro benchmarks

together with static and dynamic code properties in

our ranking function to provide a heuristics that au-

tomatically assesses the expected effect of a concrete

refactoring in a given program.

To demonstrate the practical applicability of our

approach, we present the results from two experi-

ments. In the ﬁrst experiment, we have intentionally

manipulated a given example program such that it

contains typical antipatterns and investigate the im-

pact on performance and how high the rule violations

are rated by our heuristics. In the second experiment,

we apply our rating function to a given program, im-

plement the top rated refactorings and examine the

performance improvement.

The rest of this paper is structured as follows: In

Section 2, we introduce the preliminaries that are ne-

cessary to understand the remainder of this paper. In

Section 3, we discuss related work. In Section 4, we

present our approach for the automated selection of

refactorings. In Section 5, we present our micro ben-

chmark results as well as our case studies and experi-

mental results. We conclude in Section 6.

2 BACKGROUND

In this section, we introduce the preliminaries that are

necessary to understand the remainder of this paper,

namely software antipatterns and refactorings.

2.1 Software Antipatterns

Software design patterns describe good solutions to

recurring problems in an abstract and reusable way.

A software antipattern is very similar, only that it des-

cribes a bad, unfavored solution (Long, 2001). The

motivation to write those down is to prevent their use

and to provide appropriate refactorings into better de-

signs. An example for an antipattern described in

(Smith and Williams, 2002) is The Ramp, where tasks

have an increasing execution time due to a growing

list that has to be searched but is never cleaned up.

Performance antipatterns are the class of patterns

that lead to bad performance. As an example for a

performance antipattern in the programming language

Java, consider Listing 1. In this example, strings

are concatenated with the ’+’ operator. The reason

why using the + operator as shown is considered an

antipattern is based on the internal implementation

in Java. The + operator is natively overloaded for

String, although in Java operator overloading in ge-

neral is not possible. However, this piece of syntactic

sugar brings along a disadvantage concerning perfor-

mance. Because objects of String are immutable,

the additional characters cannot simply be appended.

Instead, internally a Java StringBuilder object is al-

located, concatenates the strings in its char buffer and

returns the new immutable string. If such procedure

is repeated in a loop as shown in Listing 1, each itera-

tion allocates and dismisses a StringBuilder and an

intermediate string. It is veiled from the programmer

that there lies an inefﬁciency in the simple + syntax.

Listing 1: Example Performance Antipattern.

1 S t r i n g [ ] p = { ” T hese ” , ” a r e ” , ”

s e p a r a t e ” , ” p a r t s ” } ;

2 S t r i n g s t r = p [ 0 ] ;

3 f o r ( i n t i = 1 ; i < p . l e n g t h ; ++ i )

{

4 s t r = s t r + ” ” + p [ i ] ;

5 }

2.2 Software Refactorings

A software refactoring is a change in source code that

keeps the external behavior of a software unaffected

but yet improves the internal design or other non-

functional properties (Fowler and Beck, 1999). Ex-

amples for refactorings are splitting up large classes

into multiple units, increasing encapsulation of clas-

ses or replacing inefﬁcient operations. Refactoring is

a structured process with speciﬁed steps and a deﬁned

goal. Due to the structured procedure, refactorings are

an elegant way of improving software.

A refactoring may be a large scale operation that

affects several units and takes much effort to fully im-

plement. In this paper, we focus on micro refacto-

rings (Owen, 2016), which affect only a few lines of

code and are realizable in a short time or even au-

tomatically. Concerning performance, micro refacto-

rings can have noticeable beneﬁts, especially in often

called functions or inside frequently executed loops.

Therefore, micro refactorings have the potential of an

excellent cost-beneﬁt ratio.

The proposed performance refactoring for the ex-

ample in Listing 1 is to allocate only one String-

Builder outside the loop and use it instead of the +

operator, such that no temporary objects accumulate.

3 RELATED WORK

In (Tsantalis et al., 2006; Washizaki et al., 2009),

the authors present approaches to automatically detect

ICSOFT 2018 - 13th International Conference on Software Technologies

software design patterns based on static information

extracted from Java bytecode. However, they focus

on the detection of classical design patterns (Gamma

et al., 1995) and are not concerned with their effect on

performance. Similarly, in (Wierda et al., 2007), the

authors examine class relations in a formal concept

analysis to detect repeating patterns without the need

of prior pattern knowledge. The analysis is expen-

sive, though, and is not feasible for a large code base.

Additionally, the impact of the detected patterns on

performance is not considered.

According to the survey on design pattern de-

tection presented in (Rasool and Streitfdert, 2011),

the majority of published approaches combines struc-

tural and behavioral analyses of the software. Alt-

hough behavioral analyses are not necessarily imple-

mented as dynamic analyses, for example, the appro-

ach introduced in (Heuzeroth et al., 2003) uses static

and dynamic analyses similar to how we use them:

The static analysis provides a set of design pattern

candidates, which is narrowed down in a dynamic

analysis. For each design pattern they prepare a set

of rules concerning the interaction of classes and dis-

card every candidate that violates any of the rules. In

(Wendehals, 2003), the authors search both abstract

syntax graphs and call graphs for design patterns and

rate each candidate. The combination of both ratings

helps to determine actual design patterns. Although

both approaches combine static and dynamic analy-

ses, they again only detect patterns and are not con-

cerned with their effect on performance.

In (Bernardi et al., 2013), the authors detect design

patterns in a graph representation according to a meta

model. They develop a domain speciﬁc language that

allows precise deﬁnitions of patterns with inheritance

between them to ease the creation of variants. They

achieve a high detection precision but again, they are

not concerned with the effect on performance.

An approach to detect performance antipatterns

and suggest refactorings is proposed in (Arcelli et al.,

2012; Arcelli et al., 2015). However, they work on

software architectural models, while we focus on im-

plementations.

In (Cortellessa et al., 2010), the authors achieve a

rating of performance antipatterns based on their so

called guiltiness. The algorithm requires a complete

set of antipatterns and the set of performance requi-

rements for the system as input. Each antipattern and

requirement is associated to one or more system enti-

ties, e.g. a processor. Depending on to what extend a

requirement is not fulﬁlled the associated system en-

tities spread the guilt among all their associated an-

tipatterns while taking into account the antipattern’s

estimated impact on the respective system entity. Alt-

hough this approach succeeds in selecting the most

effective performance antipatterns, it requires an ex-

pensive modeling step to capture the component mo-

del.

In (Djoudi et al., 2005), the authors propose as-

sembly code optimization by means of static antipat-

tern detection and dynamic value proﬁling. They use

a knowledge database for assembly antipatterns that

have shown bad performance in micro benchmarks

and attempt to ﬁnd those with a static analyzer. The

dynamic analysis beneﬁts from very low instrumenta-

tion cost on the assembly level and captures data like

cache miss rate. Although this approach is closely re-

lated to ours in many ways, e.g. the focus on micro

refactorings, it utilizes the dynamic analysis as inde-

pendent addition instead of combining its yield with

the results from static analyses, and it does not target

a high-level programming language, which is often

preferrable for software evolution and maintenance.

In (Luo et al., 2017), the authors present a ma-

chine learning system that examines execution traces

of the software under test and calculates new input va-

lues for the next execution that are most promising to

uncover a performance bottleneck. Finally, an analy-

sis of the captured execution traces is carried out and

a ranked list with presumed performance bottlenecks

is compiled. Even though in our approach we utilize

very different techniques, the result, namely an orde-

red list of speciﬁc performance issues, is similar. Ho-

wever, they do not automatically propose a solution

to the detected performance issue.

In (Fontana and Zanoni, 2017), the authors use su-

pervised learning to train a model that ﬁnds antipat-

terns and rates their severity. This relieves them from

the necessity to formalize the antipatterns in order to

perform the detection. However, they rely on external

detection algorithms to support the generation of trai-

ning data, which is tedious work. Additionally, they

do not focus on performance and therefore propose

no measure to determine the impact of antipatterns on

performance.

To the best of our knowledge, no existing appro-

ach enables the automatic selection of refactorings

that are most promising to increase performance.

4 AUTOMATED SELECTION OF

REFACTORINGS

Static code analyses for antipattern detection issue too

many alleged defects in a not prioritized fashion, ren-

dering the information hard to work with efﬁciently.

Dynamic software analyses, on the other hand, yield

a lot of heterogeneous data which is not easy to inter-

Automated Selection of Software Refactorings that Improve Performance

Performance

Measures

Antipattern

Detection

Rules

Software

Runtime

Environment,

Input Data

Rating

Function

Dynamic

Analysis

Antipatterns

1. ...

2. ...

3. ...

...

Static

Analysis

Antipatterns

Figure 1: Automated Selection Approach.

pret and will not directly lead to a refactoring propo-

sition in the source code. Apparently, both techniques

have their individual disadvantages.

To overcome these problems, we propose an ap-

proach for the automated selection of refactorings that

utilizes software properties determined by both sta-

tic code analyses and dynamic software analyses. By

combining the best out of both worlds into one heuris-

tics, we compile a list of concrete refactorings sorted

by their assessed potential to improve performance.

Our key idea is that with the help of dynamically

retrieved runtime information we rank statically de-

tected antipatterns by their importance regarding the

expected impact on performance. By connecting run-

time data with speciﬁc antipatterns in the code, we

derive a precise recommendation which refactorings

are most promising to improve the performance.

Figure 1 shows our overall approach. In the top

left, a static analysis takes the software source code

and a set of antipattern detection rules as input to pro-

duce an unordered list of antipatterns, e.g. the undesi-

red use of the ’+’ operator. In the bottom left, a dyna-

mic analysis examines the software while it is execu-

ted in its runtime environment consuming some input

data. Various performance measures are the output of

this process, e.g. the execution time. In the ﬁnal step,

we introduce a rating function, which uses the dyna-

mic performance measures together with a factor that

measures the general effectiveness of a given refac-

toring to assign a severity value to each statically de-

tected antipattern. As a result, we get an ordered list

where the top entries represent the antipatterns along

with their proposed refactorings that have the highest

potential of improving performance. Thus, there is

no more need to manually handle neither the perfor-

mance measures nor the huge amount of antipatterns.

Instead, it is possible to deal with the most promising

refactorings and defer the revision of the others.

4.1 Assumptions and Requirements

Our approach is applicable to all kinds of program-

ming languages for which the corresponding static

and dynamic analyses are available. There are, howe-

ver, differences in how complicated it is to obtain run-

time information and how many sophisticated tools

there are for a certain technology. Thus, for practi-

cal reasons, we decide to tailor our approach to the

widely used programming language Java.

Our approach relies on ﬁnding antipatterns with

a static analysis and therefore is limited to what can

be found this way. To be able to analyze even large

code bases we are restricted to detection tools that

are very fast. Characteristic for antipatterns found by

those tools is that they are relatively simple and of-

ten concern only a single operation that is empirically

known to be less efﬁcient than some other operation,

e.g. the + operator that concatenates strings. Signi-

ﬁcant savings are expected especially if antipatterns

occur in loops or frequently called functions. Note

that more complex causes for performance issues, like

memory leaks, inefﬁcient or unnecessarily large da-

tabase requests or too frequent remote service calls,

are hardly detectable by a static analyses in a fail-safe

fashion. Expensive techniques, e.g. symbolic exe-

cution, would be required, but they still cover only a

small part of the considered domain. Additionally, a

high rate of false positives must be prevented because

the acceptance of a tool and the conﬁdence in its well-

functioning would diminish rapidly.

Considering this, we expect our approach to work

best for applications where the same source code

is executed very often and thus the achievable per-

formance improvement of refactoring antipatterns is

high. This decisive criterion is assumed to hold for

large business applications, e.g. server software or

micro services that get thousands of similar request

a second. Nevertheless, our approach works for other

software as well, just with smaller performance gains.

Note that for the static antipattern detection, we

require access to the source code, while for the dy-

namic analysis a runtime environment, and realistic

input data must be available.

4.2 Rating Criteria

We aim at ordering the detected performance antipat-

terns according to their severity, i.e., their potential

of improving performance. To achieve this, we deter-

mine some rating criteria. Those may be either static

properties, e.g., the location in the source code where

an antipattern is detected together with its loop depth,

or properties that can be obtained through dynamic

ICSOFT 2018 - 13th International Conference on Software Technologies

analysis, e.g., the execution time and frequency of the

surrounding method.

Static Properties

The core of our static analyis is the static antipattern

detection, which provides a list of antipatterns toget-

her with their location in the code. As an additional

static property we use the loop depth at the correspon-

ding program location as a rating criterion, i.e. within

how many layers of loops a speciﬁc antipattern is nes-

ted. We choose this property because source code

within loops has the potential of being executed very

often. If an antipattern represents an inefﬁciency, the

many repeated executions give it a higher impact on

performance and therefore it is more important to re-

factor.

Note that we do not use the actual amount of exe-

cutions of a loop or the nesting depth of a given anti-

pattern. To determine the actual amount of executions

of a loop for arbitrary inputs is an undecidable pro-

blem, thus we cannot use this information as rating

criterion. How deep an antipattern is nested in arbi-

trary control ﬂow structures, i.e., the nesting depth, is

easy to determine. One could argue that source code

within, e.g., an if-statement is executed less often.

However, we have no evidence that control ﬂow struc-

tures other than loops form a reliable correlation that

can be used as basis for a rating.

Runtime Properties

An important runtime property is the total execution

time of a method. It is deﬁned as the sum of all execu-

tion times of a method in a given program run. The-

refore, it gives an impression on how much time the

program spends in a speciﬁc method. The time spent

in subroutines is counted towards the respective su-

broutine but not the calling method. Thereby, we get a

correlation between the time spent and a very limited

number of code lines. A high total execution time in-

dicates that either some very expensive operations are

performed or the number of executions must be high,

e.g. due to a loop. In the second case an antipattern in

this method has a higher impact on performance.

Another interesting property is the call count, i.e.

how often a method is called during runtime. Using

the same reasoning as above, we consider antipatterns

in frequently called methods to have a higher impact

on performance.

The third runtime property is the memory con-

sumption of a method. To capture the memory con-

sumption of a given method in Java, we use the sus-

pension count. As Java is a memory-managed lan-

guage, the garbage collector suspends the currently

executed method from time to time. The suspension

count tells how often the garbage collector suspended

a certain method to perform a collection. We choose

this property as indicator for high memory consump-

tion with the reasoning that if a method suffers sus-

pensions disproportionately often, it probably alloca-

tes a lot of memory. The claimed correlation is ba-

sed on the assumption that a garbage collection ta-

kes place whenever all memory is used up, which sta-

tistically happens more often in allocation intensive

methods. Although different implementations of gar-

bage collectors behave very differently in many ways,

the assumption that more suspensions by the garbage

collector indicate a higher memory consumption pre-

sumably holds. Note that for many other languages

there exist proﬁling tools like Google’s gperftools for

C or the Memory Proﬁler for Python, which report

the memory consumption of each method in a given

program. Our rating function can easily be adapted

to include these measures instead of the suspension

count.

A runtime property that we leave out is the in-

crease of execution time under increasing load. If

a method takes signiﬁcantly more time just because

the system is under load, this indicates that the met-

hod contains some operation that impairs the perfor-

mance. Often, the problem is about waiting time that

is spent e.g. for synchronization between multiple

threads (Grabner, 2009). We still do not consider this

property for two reasons. First, none of our antipat-

terns causes waiting times. Second, measuring the

increase of execution time under increasing load in

a completely automated fashion is very complicated,

e.g., because a dedicated testing environment for the

measured software is required. For the same reasons

we do not utilize synchronization and waiting times.

Neither do we consider the API breakdown, because

the information which component takes the most time

is not detailed enough to form a connection with spe-

ciﬁc antipattern occurrences.

Antipattern Properties

As a further important rating criterion, we use the pro-

perties of the antipattern itself. We expect some anti-

patterns to bring high performance gains through re-

factoring while others yield only small improvements.

To assess the general effectiveness of a given set of re-

factorings, we have implemented a micro benchmark

for each class of antipattern and its refactored coun-

terpart in a before-afterwards fashion (cf. Section 5).

In doing so, we evaluate the effectiveness of each re-

factoring and thus can derive meaningful weights for

our rating function.

Note that we have the choice to utilize either the

Automated Selection of Software Refactorings that Improve Performance

relative improvement after the refactoring or the ab-

solute improvement. As we are mostly interested in a

positive effect on the performance of the whole soft-

ware it makes sense to consider the absolute gain. The

relative improvement is only of limited meaning be-

cause an operation that takes quasi no time has few sa-

ving potential even if it can be made faster by a factor

of 50. Therefore, we select the absolute effectiveness

of refactorings as rating criterion.

4.3 Rating Function

Our ﬁnal goal is to provide a heuristics for the prio-

ritization and selection of antipatterns regarding their

negative impact on performance for a given program.

To achieve this, we present a novel rating function that

can be used as a heuristics to estimate the severity of

a detected antipattern in terms of the expected effect

of refactoring the antipattern on performance. Our ra-

ting function is based on the various criteria discussed

above and forms the heart of innovation in our appro-

ach as it actually combines the statically and dynami-

cally obtained data.

We deﬁne our rating function, which determines

the expected effectiveness of refactoring a given anti-

pattern AP, as follows:

severity = exec · (calls + b · loop) · f

t,AP

+ (β · susp + 1) · (calls + b · loop) · f

m,AP

where exec is the total execution time, susp the sus-

pension count, calls the call count, loop the loop

depth, f

t,AP

an antipattern time factor and f

m,AP

antipattern memory factor. The antipattern time and

memory factors f

t,AP

and f

m,AP

capture the general ef-

fectiveness of refactoring the antipattern AP. We have

determined these factors using micro benchmarks.

The idea behind this is as follows: If the execution

time of a single piece of code only consisting of a gi-

ven antipattern can be reduced by a factor of 10 using

the proposed refactoring for this antipattern, we as-

sess the general effectiveness of this refactorings to

have a time factor of f

t,AP

= 10. If, for the same expe-

riment, the memory consumption is reduced to 50%,

the memory factor of this refactoring is f

m,AP

= 2.

We describe our micro benchmarks to determine the

time and memory factors for a given set of antipat-

terns in Section 5 and present the resulting factors in

Table 2. To ﬁne-tune our rating function, we intro-

duce the weighting factors b and β, where b weights

the relative relevance of the loop depth compared to

the call count, and β the relative relevance of memory

consumption compared to the execution time.

The ratio behind our rating function is to identify

antipatterns that are at locations in the source code

that are executed very often. Refactorings at those

locations have a larger potential of improving perfor-

mance than elsewhere and should receive a higher ra-

ting. If, e.g., the execution time of a method is high, it

is probable that this method is either called very often

or contains a loop with many runs. If an antipattern

is located in such a method with a high call count, we

assume that it is executed often. Consequently, in this

case the term exec · calls becomes large and leads to

a higher rating. If on the other hand the call count

is low but the antipattern lives within a loop, we li-

kewise assume many executions. This time the term

exec · loop becomes large and again leads to a hig-

her rating. When merged together under the premise

that either of the two cases should result in a higher

rating, we get the term exec · (calls + b · loop) in the

rating function. The weighting factor b can be used

to normalize the loop depth with respect to the call

count (the loop depth is typically between 0 and 4,

while the call count has much larger numbers), and to

express a domain- or application-speciﬁc relevance of

loop depth and call count. In applications or domains

where the loop depth is not expected to signiﬁcantly

inﬂuence the performance, b can be reduced, and vice

versa. The antipattern time factor f

t,AP

gives an esti-

mate for the general impact of the antipattern on exe-

cution time and is therefore multiplied. Altogether,

this yields the ﬁrst summand of the rating function.

The derivation of the second summand is very si-

milar. For methods that get frequently suspended for

garbage collection we assume a higher memory con-

sumption. This leads to the term susp · (calls + b ·

loop). We again multiply with the antipattern me-

mory factor f

m,AP

to take the general impact of the

antipattern on memory consumption into account. A

particularity of the suspension count is that for most

methods it simply is zero, since overall garbage col-

lection kicks in relatively seldom. Because we do not

want to zero out the whole impact on memory, we add

the constant one and get (susp + 1).

Finally, we put together the two terms, each repre-

senting an independent indication of a high impact on

performance. Since execution times can easily grow

large while the suspension count keeps low, the weig-

hting factor β should be used to align the magnitude.

In addition, the software developer can increase β to

search for refactorings that are promising to reduce

the memory consumption, and decrease β to focus on

refactorings that are promising to reduce the overall

execution time. Note that in (β · susp + 1) the con-

stant one is not scaled by β. The reasoning is that in

case of zero suspensions the summand should have

little inﬂuence and only break the tie between other-

wise similarly rated antipatterns. If scaled, the second

ICSOFT 2018 - 13th International Conference on Software Technologies

summand could grow signiﬁcantly large, although the

suspension count is zero and there actually is no evi-

dence for a high memory consumption.

4.4 Weight Determination

To normalize the loop depth with respect to the call

count and the memory suspensions with respect to

the execution time, we determine initial values of the

weighting factors b and β. As mentioned above, they

can be adjusted to ﬁne-tune the ranking function to

certain domains or to a desired performance goal.

The weighting factor b deﬁnes the relation bet-

ween call count and loop depth. To scale the loop

depth range of 0 to 4 such that it matches the magni-

tude of call counts common for the current antipattern

selection process, we choose

b =

∑

methods

calls

with N

as the number of methods. In other words, we

choose b such that it equals the average call count of a

method. In our experiments, the call count average is

a multiple of the median. Thus, the loop depth has an

adequate effect on the antipatterns rating but the ex-

treme call counts still surpass the loop depth in effect.

Note that b has to be calculated once in an antipattern

selection process.

The weighting factor β deﬁnes the relation bet-

ween execution time and suspension count:

β = α ·

∑

methods

exec

∑

methods

susp

In other words, β equals the total execution time di-

vided by the total suspension count. The idea is to

calculate the average of how much time corresponds

to one suspension and scale the suspension count ac-

cordingly. The factor α can be used to reduce the sus-

pension counts inﬂuence because it is suspected to be

less reliable and accurate than the execution time, as

the numbers generally are very low and a proper sta-

tistical distribution sets in very late. In our experi-

ments, we use α = 0.2. The weighting factor β has to

be calculated once per antipattern selection process.

5 EVALUATION

We have implemented our approach in Java. We per-

form the static code analysis with PMD (Copeland

and Le Vourch, 2017), which uses detection rules to

ﬁnd patterns in the source code. Compared to ot-

her tools like Checkstyle (Burn, 2017) and FindBugs

(Pugh and Hovemeyer, 2017), PMD has more rules

for performance antipatterns and thus is best suited

for our approach. For the recording of runtime pro-

perties, we use the monitoring tool Dynatrace App-

Mon (Dynatrace, 2017). It is widely used in practice

and provides all the dynamic properties we need.

In this section, we ﬁrst present our micro bench-

marks and experimental evaluation of the general ef-

fectiveness of a given set of refactorings independent

of a concrete program. Then, we demonstrate two ex-

periments we have conducted in order to evaluate our

approach for the automated selection of refactorings

that are promising to have a high impact on the per-

formance of a given program. As a case study, we

use STATE, a SystemC to Timed Automata Transfor-

mation Engine written in Java and developed at TU

Berlin (Herber et al., 2015; Herber et al., 2008). Alt-

hough this is not the class of software our approach is

designed for and thus the performance gain is small,

the obtained ﬁndings demonstrate the practical appli-

cability of our approach. Furthermore, we show some

example output and give a ﬁrst impression of the po-

tential of our approach.

We carried out all experiments on an Intel(R)

Core(TM) i7-2620M CPU @ 2.7 GHZ, 2 Core with

8 GB RAM running the Microsoft Windows 10 Pro

operating system. We use the Oracle JVM version 8.

5.1 Micro Benchmarks

We have implemented micro benchmarks to deter-

mine the general effectiveness of a given set of anti-

patterns independent of a concrete program. We me-

asure the effectiveness in terms of time and memory

factors f

t,AP

and f

m,AP

, which represent the relative

severity of the various antipatterns. Since we are inte-

rested in the relative effectiveness of performance re-

factorings, only the relation between the performance

of code containing the antipattern and code containg

the refactored version is important and the absolute

results do not matter.

Challenges of Java Micro Benchmarks

Micro benchmarks are not easy to design, especially

in a language like Java. There are some general pit-

falls and some that stem from how Java and its virtual

machine work (Goetz, 2005).

The ﬁrst thing to go wrong is that something com-

pletely different is measured than what was intended.

A naive example is a benchmark to measure some

arithmetical operation that writes each result to the

console. What impacts the performance in such a set-

ting is mostly the output and not the actual arithme-

Automated Selection of Software Refactorings that Improve Performance

tics. To avoid this, we put only the absolutely neces-

sary operations into the measurement code.

The accuracy of the CPU clock is by far not high

enough to capture times in the magnitude of CPU cy-

cles. Thus, we nest the operations we want to measure

into a simple, repeating loop. The repetition count

must be high enough to reach overall computation ti-

mes where the accuracy is sufﬁciently good.

A Java speciﬁc pitfall is the just-in-time compi-

ler (JIT compiler), which automatically compiles fre-

quently executed portions of the program while lea-

ving the rest for interpretation as usual. This can cor-

rupt a measurement because half of the executions are

interpreted and the other half compiled. We encounter

this challenge with a so called warm-up. This means

that we run the benchmark code 20000 times before

the actual measurement is started. This guarantees

that the JIT compiler translates the benchmark code

and we measure only the compiled version.

Another general difﬁculty is the compiler optimi-

zation in benchmarks. As the executed code does not

fulﬁll any contentual purpose the compiler may ﬁnd

out and optimize it away, rendering the whole bench-

mark useless. To solve this problem, we always return

a number that in some way contains values involved

in the measured code. Like this, we avoid optimiza-

tion with a negligible overhead.

The garbage collection in Java is another mecha-

nism we take into account with our micro benchmark

design. If, for example, a garbage collection is perfor-

med during a benchmark the execution time increases

signiﬁcantly. Hence, before each measurement, we

demand a garbage collection to happen to achieve si-

milar starting conditions. Actually, the JVM cannot

be forced to carry out a garbage collection but accor-

ding to our experiments it always obeys. Thus, if a

garbage collection takes place, it is because so much

memory was consumed.

When taking a measurement, the result is subject

to deviations. Therefore, we repeat the measurement

several times. In a series there probably are outliers,

e.g., due to some irregular background process on

the machine that executes the benchmarks. For this

reason we discard the extreme values and take the

average over the remaining as ﬁnal outcome.

Micro Benchmark Implementation

We have implemented the micro benchmarks as an

Apache Tomcat servlet (Foundation, 2017). This

allows a user friendly control in the web browser

through a simple HTML interface. To cover all PMD

performance rules, we have implemented 20 bench-

mark pairs, each consisting of one benchmark for the

antipattern and one for its refactored version. The

core of each benchmark is a speciﬁc function that

executes an antipattern or its refactored counterpart

in a loop with a certain repetition count N, in our

case N = 100000. Around this benchmark function

the measurement process is built up. One benchmark

measurement consists of two parts of which the ﬁrst

measures memory consumption and the second mea-

sures execution time.

For the memory consumption, we take the diffe-

rence in heap size of the JVM before and after the

benchmark function. Because in Java garbage col-

lection can happen at any time, it has to be considered

for the measurement, otherwise the alleged memory

consumption even may become negative. To solve

this problem, we use a callback function, which is

triggered by each garbage collection run and which

records how much space got cleared. We use this to

calculate the memory consumption as follows:

memConsumption =

(heapA f ter +

∑

GCRuns

collected) − heapBe f ore

We repeated this process 50 times before taking the

average, which we consider the true memory con-

sumption. First experiments showed that the callback

function does not reliably execute timely before the

measurement in which it was triggered is over. There-

fore, after every measurement we schedule a wait of

100ms to catch late coming garbage collections and

assign the numbers to the appropriate measurement.

For the execution time measurement, we take the

difference in the system time before and after the ben-

chmark function. To achieve a reliable value we cal-

culate an average over 50 measurements, where we

discard the lowest and highest four values beforehand.

In order to accomplish an even more reliable result

we calculate such an average for ten different cases,

where in each case the repetition count is modiﬁed

according to repCount = i · N with i = 1, 2, ..., 10. In

doing so, we get a series of supposedly equidistant

execution time averages. We calculate the distance

between every two successive results, which repre-

sents the increase in time for another N runs. Over

these distances we take the average and ﬁnally consi-

der it the true execution time of N runs of the bench-

marked antipattern or its refactored counterpart. The

deviation of the minimal distance and the maximal

distance from the average indicate the quality of the

measurement, with a low deviation conﬁrming the

outcome.

Results and Interpretation

The results from our micro benchmarks are shown in

Table 2. Note that benchmarks marked with * were

ICSOFT 2018 - 13th International Conference on Software Technologies

executed with N = 1000 due to long execution ti-

mes. The table comprises short, descriptive names

for the benchmarks and the corresponding average ti-

mes and the average memory consumptions as descri-

bed above. The time factor describes by what factor

the execution time changed in the refactored version

compared to the antipattern, with a high value indica-

ting that the refactoring is effective. The time saving

describes the absolute gain in execution time achie-

ved by the refactoring. Analogously, the values are

calculated for memory consumption.

The time factors are in a range of 0.8 to 21497.92,

i.e., some refactorings even have a negative impact

while others are incredibly effective. The time sa-

vings are in a range of -0.16 to 4275.11 milliseconds

per 100k repetitions and it is notable that a high factor

does not necessarily appear together with a high ab-

solute saving. Regarding memory consumption, the

factor range is 0.21 to 10086.67 and the savings range

is -0.09 to 198.17 MB per 100k repetitions. For those

pairs where the refactored version consumes no me-

mory the factor becomes NaN.

The results ﬁt well with the expectations we had

based on the antipattern description. For example, we

now have evidence that performing arithmetics with

the short type takes additional execution time due to

the internal type casts but saves memory. In conclu-

sion, we are very conﬁdent that our effort to design

good benchmarks payed off by providing useful re-

sults that we can use in the rating function to distin-

guish between more or less severe antipatterns.

The measurements taken with the micro bench-

marks are not only good for determining the antipat-

tern factors used in the rating function. They have a

value in themselves because the effectiveness of re-

factoring the antipatterns gets quantiﬁed in a relative

and an absolute way. The results show that there are

several refactorings that reduce execution time or me-

mory consumption by large factors. At the same time,

we get an overview of how much can be saved through

refactoring this kind of antipatterns. While for some

the savings are close to zero, for others they are mul-

tiple seconds per 100k calls. Those values suggest

that in general our approach of refactoring this kind

of statically detectable antipatterns has some effect,

as long as not every occurrence is considered but only

systematically selected ones. The results from our mi-

cro benchmarks also provide a valuable insight to the

general effectiveness of performance refactorings and

can be used by further research on antipatterns and

performance refactorings.

Table 1: Experiments with STATE.

Average Absolute Relative

Time Diff Impact

Original 2232.47 ms - -

Injected 2242.03 ms +9.56 ms 0.428 %

Refactored 2218.07 ms -14.4 ms 0.645 %

5.2 Experimental Evaluation

We have evaluated our approach with the software

STATE (Herber et al., 2015; Herber et al., 2008) in

version STATE-2.1. It is licensed as open source

under the GNU General Public License version 3

and consists of approx. 30,000 lines of code in

285 classes. We chose one of the shipped exam-

ples from STATE to do our measurements, namely

transport.

Experiment 1: Antipattern Injection. In our ﬁrst

experiment, we have injected some antipatterns in the

source code of STATE, measured their impact on per-

formance and evaluated how they get rated by our

ranking tool. To achieve this, we have duplicated the

STATE source code. Then, in one copy we have ma-

nipulated two methods by replacing all occurrences

of StringBuilder with the less efﬁcient + operator.

In this process, we have altered about 60 lines of code

and replaced in total 49 calls to append. The rest of

the source code remains unchanged.

Our expectation is that the manipulated copy runs

slower, i.e. the measured execution times are increa-

sed. We base this expectation on the benchmark re-

sults where the string concatenation antipattern sho-

wed strong impact on performance. Another expecta-

tion is that the introduced antipatterns get ranked high

in a follow up analysis of the manipulated copy, be-

cause they were injected into a prominent method and

have large antipattern factors.

The upper two rows of Table 1 show the execu-

tion times of the original STATE version compared

to the worsened version where we have injected an-

tipatterns. The average execution time of the ori-

ginal STATE software is 2232.47 ms. The average

execution time of the worsened version with antipat-

terns injected is 2242.03 ms, resulting in an absolute

difference of 9.56 ms. Thus, the refactoring of the

injected antipatterns, i.e. the restoration of the ori-

ginal state, achieves a performance improvement of

0.428%. The subsequent rating tool analysis reveals

that the injected antipatterns are found by our tool.

The 24 occurrences appear among the 26 top rated

antipatterns.

According to our expectation, the STATE version

with antipatterns injected shows worse performance

Automated Selection of Software Refactorings that Improve Performance

than the original. The execution time difference of

about half a percent is small, but we have to keep in

mind that STATE has very different characteristics to

a large-scale server software or micro service, where

our approach is supposed to exploit its full potential.

Considering this, half a percent is already a good re-

sult, especially in relation to the very low effort it ta-

kes to implement some simple, local refactorings.

Experiment 2: Performance Refactorings. In our

second experiment, we have performed a preliminary

analysis of STATE with the rating tool and subse-

quently refactored the top rated antipatterns. After-

wards, we measured if the performance was improved

by the refactorings.With our approach, we get a list of

detected antipatterns sorted according to their assig-

ned ratings. Figure 2 shows the ﬁrst ﬁve lines of the

output ﬁle slightly shortened. The large number in

square brackets is the rating assigned to the antipat-

tern. The other information helps to comprehend the

rating and to ﬁnd the antipattern in the source code.

In our experiment, we have implemented the pro-

posed refactorings for 17 of the top rated 19 anti-

patterns. Two antipatterns remain untreated. One is

an unavoidable object instantiation inside a loop and

the other would require a StringBuilder to prepend

text, which it is not intended for. Apart from the 17

refactorings the source code remains unchanged.

Our expectation is that the refactored version runs

faster than the original, i.e. the measured execution

times are reduced. Although the analyzed software

is not in our target domain of large-scale server soft-

ware, this experiment shows exactly how our appro-

ach is meant to be used in practice.

The last row of Table 1 shows the results for our

second experiment. The average execution time of the

refactored version of STATE is 2218.07 ms. Compa-

red to the original version, this results in a difference

of 14.40 ms. Thus, the refactoring of the 17 top rated

antipatterns achieves an overall performance impro-

vement of 0.645%.

Overall, we can see our expectation satisﬁed,

since the refactored version effectively executes fas-

ter than the original STATE. Again, slightly more

than half a percent is a small performance impro-

vement but the same argumentation as above holds

and we still consider the result a success. It shows

that the rating tool succeeds in proposing refactorings

that improve performance and suggests that its ap-

plication on a server software or micro service can

yield great performance gains with a small refacto-

ring effort. Note that a PMD analysis of the STATE

source code already restricted to the performance an-

tipatterns yields 843 issues. Of those, only 339 recei-

ved a rating greater than zero and promise a positive

effect on performance through refactoring at all. Note

also that using solely Dynatrace AppMon to capture

runtime properties leaves one with lots of information

without any concrete instruction on what to do.

6 CONCLUSION

In this paper, we have proposed a novel approach to

combine static and dynamic software analyses to au-

tomatically select refactorings that improve the per-

formance of a given program. Our major contribu-

tions are twofold: First, we have presented a a ra-

ting function for antipatterns, which assesses their re-

spective potential to improve performance through re-

factoring based on both static and dynamic properties.

Second, we have implemented micro benchmarks that

assess the general effectiveness of a given set of per-

formance antipatterns independent of a speciﬁc pro-

gram. Our benchmarks clearly show that the antipat-

terns actually have an effect on performance, although

the effects vary. Due to the mostly small savings, in

the majority of cases a refactoring is only reasona-

ble if the antipattern is executed frequently, e.g. in a

loop or frequently called method. This illustrates the

importance of a feasible approach to select the most

effective refactorings in a given program.

We have implemented our approach for the auto-

mated selection of refactorings that are most promi-

sing to improve the performance of a given program

using PMD (Copeland and Le Vourch, 2017) for sta-

tic code analyses and Dynatrace AppMon (Dynatrace,

2017) for dynamic software analysis to capture per-

formance measures. The result is a list of recommen-

ded refactorings ordered by effectiveness.

We have demonstrated the practical applicability

of our approach with a sample software that consists

of 30,000 lines of code. Although our approach works

best for large scale server software, it still yields some

improvement for our much smaller case study from a

totally different domain. We therefore assess the po-

tential in a large scale server software as high, especi-

ally due to the good cost-beneﬁt ratio.

Our approach enables us to select only the most

important antipatterns out of the huge amount of an-

tipatterns that are typically provided by static antipat-

tern detection tools. At the same time, it drastically

reduces the cost of interpreting data delivered by dy-

namic analyses. Due to the automated interpretation

and the precisely recommended refactorings, little ex-

pertise is required.

In future work, we plan to carry out a ﬁeld expe-

riment in which our approach is used to improve the

ICSOFT 2018 - 13th International Conference on Software Technologies

[ 3 2 8 0 5 2 9 ] exe c = 9 7 . 0 0 | s u s p = 0 | c a l l s =13595 | lo o p =0 A n t i p a t t e r n ’ A pp e nd C h a ra c t e rW i th C h a r ’ i n

met hod ’ t o S t r i n g ’ a t l i n e 288 i n f i l e L o c a t i o n . j a v a

[ 2 5 9 2 0 5 7 ] exe c = 1 67 4 . 5 1 | s us p = 3 | c a l l s = 5 | l o o p =1 A n t i p a t t e r n ’ A v o i d I n s t a n t i a t i n g O b j e c t s I n L o o p s

’ i n method ’ p a r s e P a r a l l e l ’ a t l i n e 123 i n f i l e UppaalXMLManager . j a v a

[ 1 5 4 8 6 4 7 ] e x ec = 1 5 0 4 . 2 3 | s u s p = 0 | c a l l s = 5 | l o o p =1 A n t i p a t t e r n ’ A p p e nd C h a ra c t e rW i th C h a r ’ i n

met hod ’ embed ’ a t l i n e 103 i n f i l e P a r a llel U p paal X M L Embe d d e r . j a v a

[ 1 2 8 0 6 1 1 ] e x ec = 5 8 . 5 8 | s u s p = 0 | c a l l s = 8770 | l o op =0 A n t i p a t t e r n ’ A pp e n d Ch a r a ct e rW i t h Ch a r ’ i n

met hod ’ t o S t r i n g ’ a t l i n e 221 i n f i l e T r a n s i t i o n . j a v a

[ 1 2 8 0 6 1 1 ] e x ec = 5 8 . 5 8 | s u s p = 0 | c a l l s = 8770 | l o op =0 A n t i p a t t e r n ’ A pp e n d Ch a r a ct e rW i t h Ch a r ’ i n

met hod ’ t o S t r i n g ’ a t l i n e 222 i n f i l e T r a n s i t i o n . j a v a

Figure 2: Extract of Rating Tool Results for STATE.

Table 2: Micro Benchmarks for Antipatterns.

Micro Benchmark Avg Avg Time Time Mem Mem

Time Mem Factor Saving Factor Saving

[ms] [kB] [1] [ms/100k] [1] [MB/100k]

StringBuilder using equals(””) 1.30 4133

30.62 1.24 24.52 3.96

StringBuilder using length() == 0 0.04 140

Concatenate 10 strings with plus operator 74.42 246622

2.09 40.70 5.06 198.17

Concatenate 10 strings with StringBuilder 37.21 48900

Multiple append in multiple statements 9.22 60202

0.99 -0.05 1.00 -0.09

Multiple append in only one statement 9.28 60215

Instantiate Boolean object 0.05 140

1.25 0.01 NaN 0.14

Reference pooled Boolean 0.04 0

Arithmetics with short 0.05 17

1.25 0.01 0.21 -0.07

Arithmetics with integer 0.04 82

Instantiate object with ﬁnal member * 23.45 553

1.01 22.95 1.02 0.01

Instantiate object with static ﬁnal member * 23.18 544

Call toArray with empty array 0.86 6471

0.84 -0.16 1.00 0.00

Call toArray with correctly sized array 1.02 6471

Create many small objects 0.44 2740

2.10 0.23 3.42 1.94

Create separate data arrays 0.21 802

Check ﬁrst char with startsWith 0.04 0

0.80 -0.01 NaN 0.00

Check ﬁrst char with charAt(0) 0.05 0

Copy array iteratively into List * 50.30 40539

10962.82 4275.11 1313.69 37.92

Wrap array with asList 0.39 2623

Copy array iteratively into array 10.02 15

1.02 0.16 1.07 0.00

Copy array with copyarray 9.86 14

Convert to string with + ”” 4.12 12732

1.15 0.53 2.31 7.22

Convert to string with toString 3.59 5510

Instantiate with explicitly initialized member * 27.47 570

1.08 182.75 1.01 0.01

Instantiate with implicitly initialized member * 25.32 565

Throw an exception * 30.35 1068

21497.92 2579.63 10086.67 1.06

Set ﬂag and check if it’s set 0.12 9

Create string with new 0.57 1876

14.25 0.53 NaN 1.88

Create pooled string with quotes 0.04 0

Check string equality casting both upper case 12.70 20872

2.76 8.10 NaN 20.87

Check string equality ignoring case 4.60 0

Append character with double quotes 4.17 2366

2.47 2.48 1.45 0.74

Append character with single quotes 1.69 1629

Search character with double quotes 0.93 0

1.02 0.02 NaN 0.00

Search character with single quotes 0.91 0

Check if string is empty with trim 1.00 2408

25.00 0.96 NaN 2.41

Check is string is empty with loop 0.04 0

Initialize StringBuilder too short 4.46 33252

1.17 0.64 1.18 5.09

Initialize StringBuilder sufﬁciently large 3.82 28163

performance of a large scale server software. Furt-

hermore, we plan to investigate more complex refac-

torings. For this, we plan to include static analyses

that detect performance bottlenecks, for example, via

symbolic execution.

Automated Selection of Software Refactorings that Improve Performance

REFERENCES

Arcelli, D., Berardinelli, L., and Trubiani, C. (2015). Per-

formance antipattern detection through fuml model li-

brary. In Proceedings of the 2015 Workshop on Chal-

lenges in Performance Methods for Software Develop-

ment, pages 23–28. ACM.

Arcelli, D., Cortellessa, V., and Trubiani, C. (2012). Anti-

pattern-based model refactoring for software perfor-

mance improvement. In Quality of Software Architec-

tures, pages 33–42. ACM.

Bernardi, M. L., Cimitile, M., and Di Lucca, G. A. (2013).

A model-driven graph-matching approach for design

pattern detection. In Working Conf. on Reverse Engi-

neering, pages 172–181. IEEE.

Burn, O. (2017). Checkstyle. http://checkstyle.

sourceforge.net/index.html.

Copeland, T. and Le Vourch, X. (2017). PMD. https://pmd.

github.io/.

Cortellessa, V., Martens, A., Reussner, R., and Trubiani, C.

(2010). A process to effectively identify guilty per-

formance antipatterns. In International Conference

on Fundamental Approaches to Software Engineer-

ing, pages 368–382. Springer.

Djoudi, L., Barthou, D., Carribault, P., Lemuet, C., Acqua-

viva, J.-T., and Jalby, W. (2005). Exploring applica-

tion performance: a new tool for a static/dynamic ap-

proach. In Proceedings of the 6th LACSI Symposium.

Los Alamos Computer Science Institute.

Dynatrace (2017). Dynatrace appmon. https://

www.dynatrace.com/.

Fontana, F. A. and Zanoni, M. (2017). Code smell seve-

rity classiﬁcation using machine learning techniques.

Knowledge-Based Systems, 128:43–58.

Foundation, A. S. (2017). Apache tomcat. http://tomcat.

apache.org/.

Fowler, M. and Beck, K. (1999). Refactoring: improving

the design of existing code. Addison-Wesley Professi-

onal.

Gamma, E., Helm, R., Johnson, R., and Vlissides, J.

(1995). Design Patterns: Elements of Reusable

Object-oriented Software. Addison-Wesley Longman

Publishing Co., Inc., Boston, MA, USA.

Goetz, B. (2005). Anatomy of a ﬂawed microbenchmark.

https://www.ibm.com/developerworks/java/library/

j-jtp02225/.

Grabner, A. (2009). Performance analysis: How to

identify synchronization issues under load? https://

www.dynatrace.com/blog/performance-analysis-

how-to-identify-synchronization-issues-under-load/.

Herber, P., Fellmuth, J., and Glesner, S. (2008). Mo-

del Checking SystemC Designs Using Timed Auto-

mata. In International Conference on Hardware/Soft-

ware Codesign and Integrated System Synthesis (CO-

DES+ISSS), pages 131–136. ACM press.

Herber, P., Pockrandt, M., and Glesner, S. (2015). STATE

– A SystemC to Timed Automata Transformation En-

gine. In International Conferen on Embedded Soft-

ware and Systems (ICESS), pages 1074–1077. IEEE.

Heuzeroth, D., Holl, T., Hogstrom, G., and Lowe, W.

(2003). Automatic design pattern detection. In Pro-

gram Comprehension, 2003. 11th IEEE International

Workshop on, pages 94–103. IEEE.

Long, J. (2001). Software reuse antipatterns. ACM SIGS-

OFT Software Engineering Notes, 26(4):68–76.

Louridas, P. (2006). Static code analysis. IEEE Software,

23(4):58–61.

Luo, Q., Nair, A., Grechanik, M., and Poshyvanyk, D.

(2017). Forepost: Finding performance problems au-

tomatically with feedback-directed learning software

testing. Empirical Software Engineering, 22(1):6–56.

Owen, K. (2016). Improve the smell of your code with mi-

crorefactorings. https://www.sitepoint.com/improve-

the-smell-of-your-code-with-microrefactorings/.

Pugh, B. and Hovemeyer, D. (2017). Findbugs. http://

ﬁndbugs.sourceforge.net/.

Rasool, G. and Streitfdert, D. (2011). A survey on design

pattern recovery techniques. IJCSI International Jour-

nal of Computer Science Issues, 8(2):251–260.

Smith, C. U. and Williams, L. G. (2002). New software per-

formance antipatterns: More ways to shoot yourself in

the foot. In Int. CMG Conf., pages 667–674.

Tsantalis, N., Chatzigeorgiou, A., Stephanides, G., and Hal-

kidis, S. T. (2006). Design pattern detection using si-

milarity scoring. IEEE transactions on software engi-

neering, 32(11).

Washizaki, H., Fukaya, K., Kubo, A., and Fukazawa, Y.

(2009). Detecting design patterns using source code

of before applying design patterns. In Intl. Conf. on

Computer and Information Science, pages 933–938.

IEEE.

Wendehals, L. (2003). Improving design pattern instance

recognition by dynamic analysis. In Proc. of the ICSE

2003 Workshop on Dynamic Analysis (WODA), Port-

land, USA, pages 29–32.

Wierda, A., Dortmans, E., and Somers, L. J. (2007). De-

tecting patterns in object-oriented source code - a case

study. In ICSOFT (SE), pages 13–24. INSTICC Press.

Woodside, M., Franks, G., and Petriu, D. C. (2007). The

future of software performance engineering. In Future

of Software Engineering, 2007. FOSE’07, pages 171–

187. IEEE.

ICSOFT 2018 - 13th International Conference on Software Technologies