Automatic Algorithmic Complexity Determination Using Dynamic

Program Analysis

Istvan Gergely Czibula, Zsuzsanna Onet¸-Marian and Robert-Francisc Vida

Department of Computer Science, Babes¸-Bolyai University, M. Kogalniceanu Street, Cluj-Napoca, Romania

Keywords:

Algorithmic Complexity, Dynamic Program Analysis.

Abstract:

Algorithm complexity is an important concept in computer science concerned with the efﬁciency of algo-

rithms. Understanding and improving the performance of a software system is a major concern through the

lifetime of the system especially in the maintenance and evolution phase of any software. Identifying certain

performance related issues before they actually affect the deployed system is desirable and possible if devel-

opers know the algorithmic complexity of the methods from the software system. In many software projects,

information related to algorithmic complexity is missing, thus it is hard for a developer to reason about the

performance of the system for different input data sizes. The goal of this paper is to propose a novel method

for automatically determining algorithmic complexity based on runtime measurements. We evaluate the pro-

posed approach on synthetic data and actual runtime measurements of several algorithms in order to assess its

potential and weaknesses.

1 INTRODUCTION

The performance of a software application is one of

the most important aspect for any real life software.

After the functional requirements are satisﬁed, soft-

ware developers try to predict and improve the per-

formance of the software in order to meet user expec-

tations. Performance related activities include modi-

ﬁcation of the software in order to reduce the amount

of internal storage used by the application, increase

the execution speed by replacing algorithms or com-

ponents and improve the system reliability and robust-

ness (Chapin et al., 2001).

Simulation, proﬁling and measurements are per-

formed in order to assess the performance of the sys-

tem during maintenance (McCall et al., 1985), but us-

ing just measurements performed on a developer ma-

chine can be misleading and may not provide sufﬁ-

cient insight into the performance of the deployed sys-

tem on possible different real life data load. Proﬁling

is a valuable tool but, as argued in (W. Kernighan and

J. Van Wyk, 1998), no benchmark result should ever

be taken at face value.

Analysis of an algorithm, introduced by (Knuth,

1998) is concerned with the study of the efﬁciency of

algorithms. Using algorithm analysis one can com-

pare several algorithms for the same problem, based

on the efﬁciency proﬁle of each algorithm or can rea-

son about the performance characteristics of a given

algorithm for increasing size of the input data. In

essence, studying efﬁciency means to predict the re-

sources needed for executing a given algorithm for

various inputs.

1.1 Motivation

While in case of library functions, especially stan-

dard library functions, complexity guarantees for the

exposed methods exist, such information is generally

omitted from the developer code and documentation.

The main reason for this is the difﬁculty of deduc-

ing said information by the software developer. Ana-

lyzing even a simple algorithm may require a good

understanding of combinatorics, probability theory

and algebraic dexterity (Cormen et al., 2001). Au-

tomated tools, created based on the theoretical model

presented in this paper can overcome this difﬁculty.

Knowledge about algorithmic complexity can

complement existing software engineering practices

for evaluating and improving the efﬁciency of a soft-

ware system. The main advantage of knowing the

complexity of a method is that it gives an insight into

the performance of an operation for large input data

sizes. Proﬁling and other measurement based tech-

niques can not predict the performance characteristics

of a method for other than the data load under which

186

Czibula, I., One¸t-Marian, Z. and Vida, R.

Automatic Algorithmic Complexity Determination Using Dynamic Program Analysis.

DOI: 10.5220/0007831801860193

In Proceedings of the 14th International Conference on Software Technologies (ICSOFT 2019), pages 186-193

ISBN: 978-989-758-379-7

the measurements were performed.

Knowledge about algorithmic complexity is also

beneﬁcial for mitigating security issues in software

systems. There is a well documented class of low-

bandwidth denial of service (DoS) attacks in the lit-

erature that exploit algorithmic deﬁciencies in soft-

ware systems (Crosby and Wallach, 2003). The ﬁrst

line of defence against such attacks would be to prop-

erly identify algorithmic complexity for the opera-

tions within the software system.

The main contributions of this paper is to pro-

pose and experimentally evaluate a deterministic ap-

proach to ﬁnd the asymptotic algorithmic complexity

for a method. To the best of our knowledge there is

no other approach in the literature that automatically

determines algorithmic method complexity based on

runtime data analysis.

2 RELATED WORK

In this section we will present a short overview of

some existing approaches from the literature related

to algorithmic complexity.

The ﬁrst approaches, for example (Le M

etayer,

1988), (Wegbreit, 1975) and (Rosendahl, 1989), were

based on source code analysis and the computation of

mathematical expressions describing the exact num-

ber of steps performed by the algorithm. While such

methods can compute the exact complexity bound of

an algorithm, they were deﬁned for functional pro-

gramming languages and recursive functions.

More recent approaches can be applied for other

programming languages as well, but many of them

focus only on determining the complexity of spe-

ciﬁc parts of the source code (usually loops). Such

approach is the hybrid (composed of static and

dynamic analysis) method presented in (Demonti

et al., 2015) and the modular approach presented in

(Brockschmidt et al., 2014).

Goldsmith et al. introduced in (Goldsmith et al.,

2007) a model that explains the performance of a pro-

gram as a feature of the input data. The focus is on

presenting the performance proﬁle (empirical compu-

tational complexity) of the entire program and not the

identiﬁcation of algorithmic complexity at the method

level.

Benchmark is a library, written in C++, that sup-

ports benchmarking C++ code (Benchmark, 2016).

Though it is not its main functionality, it also sup-

ports the empirical computation of asymptotic com-

plexities. A comparison of our approach to the results

provided by Benchmark are presented in Section 5.

While they do not focus on determining the com-

plexity of the source code directly, there are several

approaches that try to ﬁnd performance bugs (pieces

of code that function correctly, but where functional-

ity preserving changes can lead to substantial perfor-

mance improvement) in the source code, for example:

(Luo et al., 2017), (Olivo et al., 2015) and (Chen et al.,

2018).

An approach using evolutionary search techniques

for generating input data that trigger worst case com-

plexity is presented in (Crosby and Wallach, 2003).

Such approaches are complementary to our approach

which assumes that testing data already exists.

3 METHODOLOGY

The most used representation for algorithm ana-lysis

is the one proposed in (Knuth, 1998), the asymptotic

representation, based on the Big O notation, a conve-

nient way to deal with approximations introduced by

Paul Bachmann in (Bachmann, 1894).

Deﬁnition 1. O( f (n)) denotes the set of g(n) such

that there exist positive constants C and n

with

|g(n)| ≤C ∗ f (n) for all n ≥n

If we denote by T (n) the number of steps per-

formed by a given algorithm (n is the size of the input

data), then the problem of identifying the algorithmic

complexity becomes ﬁnding a function f (n) such that

T (n) ∈ O( f (n)). The basic idea is to ﬁnd a function

f (n) that provides an asymptotic upper bound for the

number of steps that is performed by the algorithm.

When analyzing algorithmic complexity, we are

not differentiating between functions like f (n) =

+ 7 and f (n) = 8n

+ 2n + 1 the complexity in

both cases will be T (n) ∈ O(n

). The result of algo-

rithm analysis is an approximation indicating the or-

der of growth of the running time with respect to the

input data size (Cormen et al., 2001). While the com-

plexity function can be any function, there is a set of

well known and widely used functions in the literature

used to communicate complexity guarantees. In this

paper we use the functions from Table 1, but the set

can be extended without loosing the applicability of

the proposed method. The considered functions rep-

resent complexity classes that appear frequently for

real life software systems (Weiss, 2012).

In conclusion, the problem of identifying algorith-

mic complexity becomes selecting the most appropri-

ate function f (n) from a predeﬁned set, such that f (n)

best describes the order of growth of the running time

of the analyzed method for increasing input data sizes.

In this paper we introduce an automated approach

for identifying algorithmic complexity for a method

Automatic Algorithmic Complexity Determination Using Dynamic Program Analysis

187

in a software system based on runtime measurements.

The basic idea is to measure the execution time of

the analyzed method for different input sizes and use

these measurements to identify the best ﬁtting com-

plexity function. The proposed approach consists of

three steps described in the following sections.

3.1 Step 1 - Data Collection

Our approach requires multiple executions of the ana-

lyzed method with different input data and the record-

ing of the elapsed time for each execution. For this we

instrument the analyzed methods and implement code

that executes the target methods for different inputs.

Using Java proﬁlers we were able to slightly modify

the entry and exit points of the targeted methods in

order to extract the information we needed, such as

execution time and input parameters. It is important

to note, that although these changes did impact the

execution a bit, they did not alter the original ﬂow of

instructions in any way. This approach was chosen

since the proﬁler we used was not only lightweight,

adding little overhead to the original application, but

it was also highly reusable, being compatible with any

method we needed to analyze.

The result of the measurement step for a given

method mtd in a software system is a list of pairs

mtd

= {(t

, n

)}

i=1

where m is the number of mea-

surements performed for the method mtd, n

is the

size of the input data for the i-th measurement and

measures the elapsed time for running the analyzed

method with an input data of size n

3.2 Step 2 - Data Fitting

Given a set of measurements M

mtd

= {(t

, n

)}

i=1

and

a function f (n) from Table 1, we want to identify the

best coefﬁcients c

, c

, such that the function ﬁts the

actual measured data.

For example, for f (n) = c

n ·log

(n) + c

we try

to ﬁnd c

, c

such that t

≈c

·log

)+c

for every

measurement pair. For this purpose we use the non-

linear least square data ﬁtting method (Hansen et al.,

2012), where we need to solve:

minimize

f (c

, c

) =

∑

i=1

−(c

·log

) + c

))

For determining the best values for the coefﬁcients

and c

, we used the scipy.optimize library from

Python, more exactly the curve

ﬁt function from this

library (Scipy, 2019). Implementations for the curve

ﬁtting algorithm are available in Matlab and other

programming languages and libraries as well.

In the deﬁnition of the Big O notation, the only re-

quirement regarding the constant C is that it should be

positive, but no upper limit is set, since the inequal-

ity from the deﬁnition should be true for all values

of n ≥ n

. However, in our approach the value n

(i.e., the size of the input data) is ﬁnite, it has a max-

imum value, which means that the values of the con-

stants c

, c

should also be restricted. If we consider

the function f (n) = c

log

(n)+c

, and the maximum

value of n (the maximum input size in the set M

mtd

)

is 100000, log

(100000) is approximately 17. By al-

lowing the value of c

to be 17, we actually transform

the function into log

(n).

In order to avoid the problem presented above, we

introduce an algorithm to automatically identify the

upper bounds for the coefﬁcients. We restrict the val-

ues of c

and c

to be in an interval [b

, b

]. The

value of the lower bound, b

, is set to 0.1 for ev-

ery function. The value of the upper bound, b

computed separately for every function f (n) from Ta-

ble 1 and for every set of measurements, M

mtd

. For

computing the bounds, we searched for the maximum

value of n in M

mtd

, denoted by n

max

. We also con-

sidered the functions from Table 1 with c

= 1 and

= 0, ordered increasingly by their value for n

max

For every function f (n), we considered the previous

function, f

(n) (we added the constant function,

f (n) = 1 as well, to have a previous for every func-

tion). The value of the upper bound is computed as

= p ∗( f (n

max

) − f

max

)), where p is a value

between 0 and 1, denoting the percent of the differ-

ence we want to consider. In all our experiments pre-

sented below, the value of p = 0.05 was used. If the

value b

is less than 0.1 (the lower bound) we set b

be 0.2. Table 2 contains the bounds computed for the

functions from Table 1 for n

max

equal to 10

. Since

we did not present actual data sets yet, this value was

taken just to provide an example for the bounds. In

Table 2 only the upper bounds for the functions are

given.

3.3 Step 3 - Select the Best Matching

Complexity Function

For a set of measurements M

mtd

= {(t

, n

)}

i=1

, and

for every function from Table 1 we computed at Step

2 the parameters c

, c

such that they represent the

best ﬁt for the given function.

The aim of this step is to choose one single func-

tion that is the most accurate description of the mea-

surement data and consequently identify the complex-

ity class that the analyzed method belongs to. In order

to determine the best matching function, we compute

the RMSE (root mean square error) for every function

ICSOFT 2019 - 14th International Conference on Software Technologies

188

Table 1: Complexity classes considered for the experimen-

tal evaluation.

Name Function Complexity

class

F1 c

·log

(log

(n)) + c

log

(log

(n))

F2 c

log

(n) + c

log

(n)

F3 c

·log

(n) + c

log

(n)

F4 c

·log

(n) + c

log

(n)

F5 c

·n + c

F6 c

·nlog

(n) + c

nlog

(n)

F7 c

·n

+ c

F8 c

·n

log

(n) + c

log

(n)

F9 c

·n

+ c

F10 c

·n

+ c

Table 2: Examples of bounds for a data set with the maxi-

mum input data size 10

Func. Upper Bound Func. Upper bound

F1 0.2 F6 6143.856

F2 0.132 F7 4.99 ·10

F3 0.478 F8 6.14 ·10

F4 8.164 F9 4.99 ·10

F5 491.172 F10 5 ·10

F from Table 1 considering the values for c

and c

from Step 2. We pick as the complexity class best de-

scribing our data, the function F with the minimum

RMSE.

4 COMPUTATIONAL

EXPERIMENTS

In order to assess the effectiveness of the proposed

approach, we will perform a series of experiments on

synthetic data and actual measurements for different

algorithms. The experiments were chosen based on

the literature review, existing approaches in the liter-

ature use similar test systems. In the ﬁrst experiment

we exemplify the potential of our approach using syn-

thetic measurement data. The second experiment is

performed on a small code base with various sorting

methods. Similar experiments where performed in

the literature in (Wegbreit, 1975) in order to evaluate

the effectiveness of their proposed approaches. The

scope of the last experiment is to illustrate the poten-

tial of the proposed approach beyond identifying the

runtime complexity of a method.

Table 3: Hidden functions used to generate data sets.

Name Function f (n) Complexity

class

HF1 5n

+ 20 n

HF2 3n

+ 7n + 20 n

HF3 log

(n) + 7 log

(n)

HF4 log

(n) +

log

(n)+ log

(n)

log

(log

(n))

HF5 4log

(n)+ log

(n)

11log

(n) + 25

HF6 n

log

(n) + 12n

log

(n)

HF7 9n + 15

√

n + 7 n

HF8 25n

+ 2n

+ 500n + 4 n

HF9 n

+ 500nlog

(n) n

HF10 20nlog

(n) + 100n + 3 nlog

(n)

4.1 Synthetic Data

The aim of the ﬁrst experiment is to illustrate the pro-

posed method, to verify its potential and evaluate pos-

sible limitations. We generate multiple sets of syn-

thetic measurement data based on different mathemat-

ical functions. The assumption is that the number of

steps performed by a given method can be described

with a function. The set of functions used for these

experiments and the complexity class they belong to

is presented in Table 3. We used 10 different func-

tions in order to simulate multiple possible running

scenarios. From now on we will call these functions

hidden functions, based on the idea that, while they

describe the number of steps performed by an algo-

rithm, in general, they are not known.

We generate a separate data set for every hidden

function. Each data set contains 50 pairs of values

[n, f (n)] where n is ranging from 100 to 10 million,

evenly distributed in the mentioned interval. Intu-

itively every function corresponds to a method in a

software system and every generated pair corresponds

to a measurement (execute the method and collect in-

put data and elapsed time).

Using the generated data, we try to predict the run-

time complexity of every method, the input data for

every experiment is one data set and the output is a

function representing the complexity of the associated

method.

The ﬁrst step is to compute the coefﬁcients for ev-

ery considered complexity class from Table 1 based

on the points generated for the hidden functions from

Table 3. For this we used the method described in

Section 3.2.

The next step is to compute the root mean squared

error (RMSE) between the data and every considered

complexity function.

Automatic Algorithmic Complexity Determination Using Dynamic Program Analysis

189

The last step is to select the complexity function

that best describes the analyzed data set, basically we

pick the one with the smallest RMSE.

We performed the experiment for every data set

and our approach is able to correctly identify the

complexity class for every considered hidden function

from Table 3.

4.2 Sorting Methods Case Studies

The aim of this case study is to classify algorithms

into different complexity classes based on measure-

ment data. We measured (using the methodology

from Section 3.1) the running time for various sort-

ing methods in order to automatically determine their

complexity. We choose this experiment as other ap-

proaches in the literature related to the topic of algo-

rithmic complexity use as a case study similar exam-

ples.

The analyzed project includes various sorting

methods: insertion sort, selection sort, merge sort,

quicksort, bubble sort, heapsort and code that exe-

cutes those methods for different lists of numbers.

4.2.1 Collecting Measurements

The code will invoke each sorting method for arrays

sorted in ascending order, descending order and with

elements in random order, varying the length of the

array for each invocation. We have chosen these types

of input array orderings, because, for many sorting

algorithms, they are connected to the best, worst and

average case runtime complexity of the algorithm.

We generated the input arrays of numbers in as-

cending, descending and random order. As presented

above we have considered 6 different sorting algo-

rithms, but for quicksort we considered two different

implementations: one in which the ﬁrst element of

the arrays is always chosen as the pivot (this version

will be called quicksort ﬁrst), and one where the mid-

dle element was always chosen as the pivot (quicksort

middle). The expected correct results for all sorting

algorithms and the 3 input array orderings are pre-

sented in Table 4, where asc. means ascending, desc.

means descending and rnd. means random.

We run the experiments on two regular laptops

with different hardware speciﬁcations. A ﬁrst group

of data sets were created on one laptop, denoted by L

where for each of the above mentioned 7 sorting al-

gorithms (considering both versions of quicksort) and

for each of the three input array orderings (ascending,

descending and random) we have created four data

sets. This gives us a total of 84 L

data sets.

A second group of data sets were created on a sec-

ond laptop, denoted by L

, where for each of the sort-

Table 4: Correct complexity classes for the considered al-

gorithms and input array orderings.

Algorithm Asc. Desc. Rnd.

insertion sort n n

selection sort n

merge sort nlog

(n) nlog

(n)

quicksort ﬁrst n

nlog

(n)

quicksort mid. nlog

(n) nlog

(n)

bubble sort n n

heapsort nlog

(n) nlog

(n)

ing algorithms (for quicksort one single version was

considered, quicksort ﬁrst) and for each of the input

array orderings, we have created one data set. This

gives us a total of 18 L

data sets.

In order to test the generality of our approach, we

have created data sets containing measurements for

C++ code as well. For these data sets, the measure-

ments were performed using the Benchmark library

(Benchmark, 2016). For each of the 7 sorting algo-

rithms, and for each of the three input array orderings

we have created two data sets, a short one and a long

one (the exact number of points is given below). For

this group, we have a total of 42 data sets and we call

it the BM data sets.

Each of the 144 data sets contains between 31 and

37 pairs of measurements representing the size of the

array and the run-time in nanoseconds. The set of 37

different array sizes is S = {s ∈ N|s = i ·10

, s <=

2 ·10

, i ∈ N ∩[1, 9], j ∈ N ∩[2, 6]}. The exact num-

ber of measurements for every sorting algorithm and

group is presented in Table 5. The data sets with less

than 37 points do not contain measurements for the

last points of the above list because running the corre-

sponding algorithms would have taken too much time

(for example, in case of bubble sort in group L

, the

last data point is for an array of length 500000).

Table 5: Number of measurements for data sets.

Group Sorting method Number of points

insertion sort 37

selection sort 37

merge sort 37

quicksort ﬁrst 32

quicksort middle 32

bubble sort 32

heapsort 37

all data sets 36

short data sets 31

long data sets 37

ICSOFT 2019 - 14th International Conference on Software Technologies

190

4.2.2 Determining the Complexity for Sorting

Data

We have performed an experiment similar to the one

presented in Section 4.1: for every data set we deter-

mined the complexity class which is the best match

for the points from the data set. We have considered

the 10 complexity classes from Table 1.

From the 144 data sets, our algorithm returned the

correct complexity class (i.e., the ones from Table 4)

for 137 data set, having an accuracy of 95%.

Considering these results we can conclude that our

approach can identify the complexity class based on

runtime measurement data with a good accuracy.

4.2.3 Data Set Size Reduction

The next step of this experiment is to verify if the ob-

tained accuracy is maintained even if we reduce the

number of recorded measurement samples. From ev-

ery data set we have randomly removed points: ﬁrst

we removed one point randomly and run our approach

to determine the complexity class to which the re-

maining points belong. Then, starting from the orig-

inal data set, we removed two points and determined

the complexity class for the remaining points. And so

on, until only 5 points remained. We repeated each

experiment 100 times and counted how many times

the correct result was obtained, which is the accuracy

for the given data set.

For every sorting algorithm and input array order-

ing (ascending, descending, random), we computed

the average accuracy over the 7 data sets (6 in case

of quicksort middle) and two of these are presented in

Figure 1. The accuracy is computed for data sets with

the same number of points left.

Due to lack of space, Figure 1 does not contain all

sorting algorithms. We did not include the results for

quicksort middle, heap sort, selection sort and merge

sort because for these algorithms the accuracy is al-

most 100% for each case. We did not include bubble

sort either, which is similar to insertion sort.

The two sorting algorithms from Figure 1 contain

lower accuracy values for some array orderings. In

case of insertion sort, descending and random array

have almost perfect accuracy (above 98% for every

case), but for ascending arrays the accuracy is be-

tween 83% and 53%. The reason for these decreased

values is that for the two ascending BM data sets our

approach did not ﬁnd the correct complexity class for

the initial data sets and it has an accuracy of 0 (for

these two data sets) when we start removing points.

Quicksort ﬁrst is another sorting algorithm for

which our approach could not ﬁnd the correct com-

plexity class in every case and this is visible on Figure

1 as well. For quicksort ﬁrst, an incorrect result was

returned for the ascending and descending array from

the L

group, and for these two data sets the accuracy

is constantly 0 when we remove elements.

As expected, the quality and quantity of the mea-

surements inﬂuence the accuracy of our approach, but

the results are still promising and further work will be

done in order to identify bottlenecks.

4.3 Map Case Study

The aim of this case study is to show the potential of

the proposed approach to identify non-obvious perfor-

mance related problems in real life software systems.

The Map (Dictionary) is a widely used data type

described as an unsorted set of key-value elements

where the keys are unique. For this discussion we will

refer to the HashMap implementation available in the

Java programming language.

Given an implementation similar with the one

from Figure 2 an average software developer would

assume an algorithmic runtime complexity n for tar-

getMethod where n is the number of keys in the Map.

For most use cases this assumption is true. How-

ever, we managed to write specially crafted code for

which, based on runtime measurements, our approach

returned the complexity classes n

and nlog

n, which

are worse than developers’ intuition.

While the results may look surprising or erro-

neous, they are in fact correct. In order to get a com-

plexity class of n

, we need to make sure that every

key has the same hash code, so they will all be added

into the same bucket. In this way ﬁnding one key will

have a complexity of n.

The nlog

n complexity is obtained when the keys

are comparable, because in this situation in the Java

implementation of HashMap the individual buckets

will switch to a balanced tree representation when-

ever the amount of elements within exceeds a thresh-

old. Assuming that all elements have the same hash

code, searching for a key will have a complexity of

log

n. After a closer look into the HashMap docu-

mentation, speciﬁcally this change (OpenJDK, 2017),

we ﬁnd out about this lesser known behaviour.

Our approach can be used to develop automated

tools that identify performance related problems and

to indicate possible security issues that can be ex-

ploited by malicious users or systems that try to ex-

ploit algorithmic deﬁciencies. While in this experi-

ment we created special code to reproduce the perfor-

mance issue, measurement data can be collected by

other means (Section 3.1).

Automatic Algorithmic Complexity Determination Using Dynamic Program Analysis

191

(a) Insertion sort (b) Quicksort ﬁrst

Figure 1: Average accuracy over the sorting data sets after the removal of points.

Figure 2: HashMap data collection function.

5 DISCUSSION AND

COMPARISON TO RELATED

WORK

In Section 4.1 we illustrated experimentally that our

approach is able to correctly identify the runtime

complexity even from a small number of measure-

ment samples.

Input and output blocking instructions, network

communication as well as multithreaded operations

are not speciﬁcally handled with our approach, so at

ﬁrst glance one might see this as a problem. However,

this is ﬁne considering that our analysis would be per-

formed in a test, where one could mock blocking in-

structions. Furthermore, even if it was not possible to

eliminate these aspects from our analysis, we would

still want to be able to measure the performance of the

rest of the code.

The running time may not reﬂect the number of

steps performed by the algorithm. For practical rea-

sons the actual runtime is more important, and with

the aid of other tools such as proﬁlers, we can identify

the factors that inﬂuenced said execution durations.

The most similar approach to the one proposed

in this paper is Benchmark (Benchmark, 2016), but

it can only be used for computing the complexity of

C++ code. However, since it is an open source library,

we implemented their method in Python to compare it

with our proposed approach. We have used two set-

tings for the experiments: one in which all the com-

plexity classes considered for our approach (the ones

from Table 1) were considered and one experiment in

which only those six complexity classes were consid-

ered, which are used in the Benchmark library as well

(this included the constant complexity class as well,

which is not used in our approach).

For the ﬁrst setting, the Benchmark implementa-

tion found the correct complexity class for 55 out of

144 data set, an accuracy of 38%. For the second set-

ting, the results were better, 83 correctly classiﬁed

data sets, resulting in an accuracy of 58%. The ac-

curacy for our approach on these data sets, was 95%,

which is a lot better. Moreover, for the data set for

which our approach did not ﬁnd the correct complex-

ity class, neither did the benchmark implementation.

However, Benchmark was developed for C++

code, so an explanation for the poor performance can

be the large number of data sets (102 out of 144) con-

taining measurements for Java code. In order to in-

vestigate this theory, we measured the accuracy sepa-

rately for the C++ data sets (generated using Bench-

mark) and the Java data sets. The accuracies are pre-

sented in Table 6.

Table 6: Accuracies for C++ and Java data sets for our ap-

proach and the Benckmark implementations.

Method C++ data set Java data set

Benchmark 10 76% 23 %

complexity classes

Benchmark 6 81 % 48%

complexity classes

Our approach 90% 97 %

From Table 6 we can see that considering only

the data sets generated for C++ code, the accuracy

of the Benchmark implementation is approximately

twice as high as for the Java data sets. This suggests

that determining the complexity class for Java code is

more complicated than determining it for C++ code

and more complex methods are needed for it.

Other approaches in the literature focus on a spe-

ciﬁc type of algorithms (such as (Zimmermann and

ICSOFT 2019 - 14th International Conference on Software Technologies

192

Zimmermann, 1989) dealing with divide and conquer

algorithms) or speciﬁc parts of the code (such as loops

in (Demonti

e et al., 2015)). Our approach is more

general, it is not constrained by the type of algorithm

used in the analyzed method. We focus on an entire

method that has the additional beneﬁt that the instru-

mented code used to collect runtime data is less costly

since our only concern is the execution time of the

method.

6 CONCLUSIONS AND FUTURE

WORK

In this paper we have introduced an approach for auto-

matically determining the algorithmic complexity of a

method from a software system, using runtime mea-

surements. The results of the experimental evaluation

show the potential of our approach. Although the ap-

proach has a good accuracy, further improvement is

possible by analyzing the particularities of the mis-

classiﬁed examples.

The next step would be automatizing this whole

process and making it readily available to developers.

The research and experiments described in this paper

serve as solid groundwork for creating such a tool that

allows for real time algorithm complexity veriﬁcation.

The need for this functionality becomes clear when

one considers the beneﬁts gained, such as the abil-

ity to easily identify potential performance or security

misconceptions developers might have when writing

the code. The exact manner in which said tool might

function still needs analysis and experimentation, but

a possible form might be akin to unit tests that are ex-

ecuted within a continuous integration environment.

REFERENCES

Bachmann, P. (1894). Die Analytische Zahlentheorie.

Benchmark (2016). Benchmark library.

https://github.com/google/benchmark.

Brockschmidt, M., Emmes, F., Falke, S., Fuhs, C., and

Giesl, J. (2014). Alternating runtime and size com-

plexity analysis of integer programs. In International

Conference on Tools and Algorithms for the Con-

struction and Analysis of Systems, pages 140–155.

Springer.

Chapin, N., Hale, J. E., Kham, K. M., Ramil, J. F., and Tan,

W.-G. (2001). Types of software evolution and soft-

ware maintenance. Journal of Software Maintenance,

13(1):3–30.

Chen, Z., Chen, B., Xiao, L., Wang, X., Chen, L., Liu, Y.,

and Xu, B. (2018). Speedoo: prioritizing performance

optimization opportunities. In 2018 IEEE/ACM 40th

International Conference on Software Engineering

(ICSE), pages 811–821. IEEE.

Cormen, T. H., Stein, C., Rivest, R. L., and Leiserson, C. E.

(2001). Introduction to Algorithms. McGraw-Hill

Higher Education, 2nd edition.

Crosby, S. A. and Wallach, D. S. (2003). Denial of ser-

vice via algorithmic complexity attacks. In Proceed-

ings of the 12th Conference on USENIX Security Sym-

posium - Volume 12, SSYM’03, pages 3–3, Berkeley,

CA, USA. USENIX Association.

Demonti

e, F., Cezar, J., Bigonha, M., Campos, F., and

Magno Quint

ao Pereira, F. (2015). Automatic infer-

ence of loop complexity through polynomial interpo-

lation. In Pardo, A. and Swierstra, S. D., editors, Pro-

gramming Languages, pages 1–15, Cham. Springer

International Publishing.

Goldsmith, S. F., Aiken, A. S., and Wilkerson, D. S. (2007).

Measuring empirical computational complexity. In

Proceedings of the the 6th Joint Meeting of the Euro-

pean Software Engineering Conference and the ACM

SIGSOFT Symposium on The Foundations of Software

Engineering, ESEC-FSE ’07, pages 395–404, New

York, NY, USA. ACM.

Hansen, P., Pereyra, V., and Scherer, G. (2012). Least

Squares Data Fitting with Applications. Least Squares

Data Fitting with Applications. Johns Hopkins Uni-

versity Press.

Knuth, D. E. (1998). The Art of Computer Programming,

Volume 3: (2Nd Ed.) Sorting and Searching. Addison

Wesley Longman Publishing Co., Inc., Redwood City,

CA, USA.

Le M

etayer, D. (1988). Ace: An automatic complexity eval-

uator. ACM Trans. Program. Lang. Syst., 10:248–266.

Luo, Q., Nair, A., Grechanik, M., and Poshyvanyk, D.

(2017). Forepost: Finding performance problems au-

tomatically with feedback-directed learning software

testing. Empirical Software Engineering, 22(1):6–56.

McCall, J. A., Herndon, M. A., Osborne, W. M., and States.,

U. (1985). Software maintenance management [mi-

croform] / James A. McCall and Mary A. Herndon,

Wilma M. Osborne. U.S. Dept. of Commerce, Na-

tional Bureau of Standards.

Olivo, O., Dillig, I., and Lin, C. (2015). Static detection of

asymptotic performance bugs in collection traversals.

In ACM SIGPLAN Notices, volume 50, pages 369–

378. ACM.

OpenJDK (2017). Hashmap implementation change.

https://openjdk.java.net/jeps/180.

Rosendahl, M. (1989). Automatic complexity analysis. In

Fpca, volume 89, pages 144–156. Citeseer.

Scipy (2019). Python scipy.optimize documentation.

https://docs.scipy.org/doc/scipy/reference/optimize.html.

W. Kernighan, B. and J. Van Wyk, C. (1998). Timing tri-

als, or, the trials of timing: Experiments with scripting

and user-interface languages. Software: Practice and

Experience, 28.

Wegbreit, B. (1975). Mechanical program analysis. Com-

munications of the ACM, 18(9):528–539.

Weiss, M. A. (2012). Data Structures and Algorithm Anal-

ysis in Java. Pearson Education, Inc.

Zimmermann, P. and Zimmermann, W. (1989). The auto-

matic complexity analysis of divide-and-conquer al-

gorithms. Research Report RR-1149, INRIA. Projet

EURECA.

Automatic Algorithmic Complexity Determination Using Dynamic Program Analysis

193