Challenges of ELA-Guided Function Evolution Using Genetic
Programming
Fu Xing Long
1 a
, Diederick Vermetten
2 b
, Anna V. Kononova
2 c
, Roman Kalkreuth
3 d
,
Kaifeng Yang
4 e
, Thomas B
¨
ack
2 f
and Niki van Stein
2 g
1
BMW Group, Knorrstraße 147, Munich, Germany
2
LIACS, Leiden University, Niels Bohrweg 1, Leiden, The Netherlands
3
Computer Lab of Paris 6, Sorbonne Universit
´
e, Paris, France
4
University of Applied Sciences Upper Austria, Softwarepark 11, 4232, Hagenberg, Austria
Keywords:
Function Generator, Genetic Programming, Exploratory Landscape Analysis, Instance Spaces.
Abstract:
Within the optimization community, the question of how to generate new optimization problems has been
gaining traction in recent years. Within topics such as instance space analysis (ISA), the generation of new
problems can provide new benchmarks which are not yet explored in existing research. Beyond that, this
function generation can also be exploited for solving expensive real-world optimization problems. By gen-
erating fast-to-evaluate functions with similar optimization properties to the target problems, we can create a
test set for algorithm selection and configuration purposes. However, the generation of functions with specific
target properties remains challenging. While features exist to capture low-level landscape properties, they
might not always capture the intended high-level features. We show that it is challenging to find satisfying
functions through a genetic programming (GP) approach guided by the exploratory landscape analysis (ELA)
properties. Our results suggest that careful considerations of the weighting of ELA properties, as well as the
distance measure used, might be required to evolve functions that are sufficiently representative to the target
landscape.
1 INTRODUCTION
Benchmark problems play a key role in our abil-
ity to efficiently evaluate and compare the perfor-
mance of optimization algorithms. Well-constructed
benchmark suites provide researchers the opportunity
to gauge the different strengths and weaknesses of
a wide variety of optimization algorithms (Hansen
et al., 2010). Following this, carefully handcrafted
sets of problems, such as the black-box optimization
benchmarking (BBOB) suite (Hansen et al., 2009),
have become increasingly popular (Bartz-Beielstein
et al., 2020). Beyond benchmarking purposes, the
a
https://orcid.org/0000-0003-4550-5777
b
https://orcid.org/0000-0003-3040-7162
c
https://orcid.org/0000-0002-4138-7024
d
https://orcid.org/0000-0003-1449-5131
e
https://orcid.org/0000-0002-3353-3298
f
https://orcid.org/0000-0001-6768-1478
g
https://orcid.org/0000-0002-0013-7969
BBOB suite has been intensively used in the research
field of algorithm selection problem (ASP) (Rice,
1976), with the focus on identifying computational
and time-efficient algorithms for a particular prob-
lem instance. Recently, this has been associated
with the optimization landscape properties of prob-
lem instances, where the landscape properties are ex-
ploited to predict the performance of optimization al-
gorithms, e.g., using machine learning models (Ker-
schke and Trautmann, 2019a).
One inherent limitation of these hand-crafted
benchmark suites, however, lies in the fact that they
can never cover the full instance space. For instance,
it has been shown that real-world automotive problem
instances are insufficiently represented by the BBOB
functions in terms of landscape properties (Long
et al., 2022). Consequently, there is a growing trend
in understanding the coverage of problem classes by
existing benchmark sets, and in the creation of new
benchmarks to fill the gaps (Smith-Miles and Mu
˜
noz,
Long, F., Vermetten, D., Kononova, A., Kalkreuth, R., Yang, K., Bäck, T. and van Stein, N.
Challenges of ELA-Guided Function Evolution Using Genetic Programming.
DOI: 10.5220/0012206200003595
In Proceedings of the 15th International Joint Conference on Computational Intelligence (IJCCI 2023), pages 119-130
ISBN: 978-989-758-674-3; ISSN: 2184-3236
Copyright © 2023 by SCITEPRESS Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)
119
2023; Mu
˜
noz and Smith-Miles, 2020). In this re-
search area, or commonly known as instance space
analysis (ISA), a feature-based representation of the
problem instances are used to identify functions that
are lacking and should be newly created. This is often
combined with a performance-oriented view of sev-
eral optimization algorithms, leading to the creation
of new functions, where the benefits of one algorithm
over the others can clearly be observed.
The most common approach to generating new
benchmark problems is through the use of genetic
programming (GP) (Mu
˜
noz and Smith-Miles, 2020).
Since GP has a long history in the domain of symbolic
regression (SR), it is a natural choice for the creation
of optimization problems. Essentially, GP is guided
towards a target feature vector in a poorly-covered
part of the instance space. These features can be gen-
erated, for example, using the exploratory landscape
analysis (ELA) (Mersmann et al., 2011), which aims
to capture low-level information about the problem
landscape using a limited number of function evalu-
ations.
While this GP-based approach to function genera-
tion has shown considerable promise in ISA, it can
also be used to create a set of representative func-
tions for algorithm selection and hyperparameter tun-
ing purposes. This is especially useful for real-world
optimization problems with expensive function evalu-
ation, e.g., requiring simulator runs. Indeed, previous
work has shown that benchmark problems with simi-
lar characteristics as real-world problems can be use-
ful to tune optimization algorithms, leading to perfor-
mance benefits on the original problems (Thomaser
et al., 2023). Therefore, the ability to generate a set of
problems with similar optimization properties would
be of significant practical importance.
In this work, we focus on investigating how a
GP guided by ELA features can be utilized to gener-
ate problems which are similar to known benchmark
functions. This illustrates the challenges which still
need to be overcome to efficiently generate sets of
feature-based surrogate problems. In particular, our
contributions are as follows:
1. We adapt the random function generator (RFG)
from (Tian et al., 2020) into a GP approach and
investigate the impact this has on the distribution
of ELA features of the generated problems.
2. We investigate the impact of the used distance
measure between ELA feature vectors. Our re-
sults suggest that the Wasserstein distance metric
and equal treatment of all features might not be
desirable.
2 RELATED WORK
2.1 Black-Box Optimization
Benchmarking
The BBOB family of problem suites are some of the
most well-known sets of problems for benchmark-
ing optimization heuristic algorithms (Hansen et al.,
2010), particularly the original continuous, noiseless,
single-objective suite, which is often referred to as
the BBOB (Hansen et al., 2009). To facilitate the
benchmarking purposes, the BBOB suite has been in-
tegrated as part of the comparing continuous optimiz-
ers (COCO) platform (Hansen et al., 2021) and it-
erative optimization heuristics profiler tool (IOHPro-
filer) (Doerr et al., 2018), where the statistics of al-
gorithm performances stored can be easily retrieved.
Due to its popularity, the BBOB suite has also become
a common testbed for automated algorithm selection
and configuration techniques, even though the suite
was never designed with this in mind.
Altogether, the aforementioned original BBOB
suite consists of 24 functions from ve problem
classes based on their global properties. While the
BBOB functions were originally designed for un-
constrained optimization, in practice however, they
are usually considered within the search domain of
[5,5]
d
with their global optimum located within
[4,4]
d
. Beyond the fact that they can be scaled to
arbitrary dimensionality d, the BBOB functions have
the advantage that different variants or problem in-
stances can be easily generated through a transforma-
tion of the search domain and objective values. This
transformation mechanism is internally integrated in
BBOB and controlled by a unique identifier, or also
called IID.
2.2 Exploratory Landscape Analysis
In landscape-aware ASP, the landscape properties of
problem instances are associated with the perfor-
mance of optimization algorithms. For this, the most
common way is by characterizing the landscape char-
acteristics or high-level properties of a problem in-
stance, such as its global structure, multi-modality
and separability (Mersmann et al., 2010). Nonethe-
less, an accurate characterization of these high-level
properties is challenging without expert knowledge.
To facilitate the landscape characterization, ELA has
been introduced to capture the low-level properties
of a problem instance, e.g., y-distribution, level set
and meta-model (Mersmann et al., 2011). It has been
shown that these ELA features are sufficiently expres-
sive in accurately classifying the BBOB functions ac-
ECTA 2023 - 15th International Conference on Evolutionary Computation Theory and Applications
120
cording to their corresponding problem classes (Re-
nau et al., 2021) and also informative for algorithm
selection purposes (Mu
˜
noz et al., 2015; Kerschke
et al., 2019).
In the ELA, landscape features are computed pri-
marily using a design of experiment (DoE) of some
sample points X = {x
1
,...,x
n
} evaluated on an ob-
jective function h, i.e., h: R
d
R, with x
i
R
d
, n
represents sample size, and d represents function di-
mensionality. In this work, we compute the ELA fea-
tures using the pflacco package (Prager and Traut-
mann, 2023b), which was developed based on the
flacco package (Kerschke and Trautmann, 2019b).
With more than 300 ELA features that can be com-
puted, we consider only the ELA features that can
be cheaply computed without additional re-sampling,
similar to the work in (Long et al., 2022), and dis-
regard the ELA features that only concern the DoE
samples (altogether four of the principal component
analysis features).
While we are fully aware that the ELA features are
highly sensitive to sample size (Mu
˜
noz et al., 2022)
and sampling strategy (
ˇ
Skvorc et al., 2021a), these
aspects are beyond the scope of this work. Through-
out this work, we consider the Sobol’ sampling tech-
nique (Sobol’, 1967) based on the results in (Renau
et al., 2020).
2.3 Instance Space Analysis
In general, ISA is a methodology of benchmarking
algorithms and assessing their strengths and weak-
nesses based on clusters of problem instances (Smith-
Miles and Mu
˜
noz, 2023). The instance space refers
to the set of all possible problem instances that can
be used to evaluate the performance of such an op-
timization algorithm. The fundamental idea behind
ISA is to model the relationship between the struc-
tural properties of a given problem instance and the
performance of a set of algorithms. Through this ap-
proach, footprints can be constructed for each algo-
rithm, which are essentially regions in the instance
space where statistically significant performance im-
provements can be inferred.
In our proposed approach, a similar concept is
utilized, namely to find such problem instances that
resemble specific expensive real-world problems, to
allow the comparison and benchmarking of vari-
ous algorithms on a highly specific instance space.
This differs from other related works, such as the
Melbourne algorithm test instance library with data
analytics (MATILDA) software (Smith-Miles and
Mu
˜
noz, 2023), because while other works try to cre-
ate a space-covering of instances for a general bench-
marking purpose and comparison of algorithms, we
aim to generate highly specialised benchmarks sets
that are fast-to-evaluate and representative for expen-
sive real-world black box problems. Generating such
domain-specific benchmarks would allow us to search
and optimize specific algorithm configurations that
are better applicable to a specific problem domain.
Furthermore, it also allows a better understanding of
the expensive black-box problem through the analysis
of different optimization landscapes that represent the
found instance space.
2.4 Genetic Programming
Principally, GP can be considered as a search heuris-
tic for computer program synthesis that is inspired
by neo-Darwinian evolution (Koza, 1989). As pro-
posed by Koza, GP traditionally uses trees as pro-
gram representation. A typical application of GP is to
solve optimization problems that can be formulated
as: argmin
t
f (t),t T , where t = (t
1
,··· ,t
n
) repre-
sents a decision vector (also known as individual or
solution candidate) in evolutionary algorithms (EAs).
Similar to other EAs, GP evolves a population of solu-
tion candidates by following the principle of the sur-
vival of the fittest and utilizing biologically-inspired
operators. The feature distinguishing GP from other
EAs is the variable-length representation for t, instead
of a fixed-length representation.
Throughout the years, GP has been widely used
to solve regression problems by searching through a
space of mathematical expressions. In fact, GP-based
SR (Tackett, 1995) is popular as an interpretable al-
ternative to black-box regression methods, where GP
is used to search for an explicit mathematical expres-
sion for a given dataset. By producing a mathematical
expression that can be easily understood by humans,
SR has proven to be a valuable tool in engineering
applications, where it is important to comprehend the
relationship between different decision variables.
While GP-based SR was mainly used as a surro-
gate model to either quantify the relationship between
different decision variables or replace expensive op-
timization problems in previous work (e.g., in engi-
neering), we focus on utilizing canonical GP (Tian
et al., 2020) to create functions with specific target
ELA features in this work.
2.5 Generating Black-Box Optimization
Problems
Apart from the expertly designed benchmarking test
suites, Tian et al. introduced a SR approach in gen-
Challenges of ELA-Guided Function Evolution Using Genetic Programming
121
erating continuous black-box optimization problems
(Tian et al., 2020). In their work, a function gener-
ator was proposed to generate problem instances of
different complexity in the form of tree representa-
tions that serve as training samples for a recommen-
dation model. More specifically, the function gener-
ator constructs a tree representation by randomly se-
lecting mathematical operands and operators from a
predefined pool, where each operand and operator has
a specific probability of being selected. In this way,
any arbitrary number of functions can be quickly gen-
erated. To improve the functional complexity that can
be generated, such as noise, multi-modal landscape
and complex linkage between variables, a difficulty
injection operation was included to modify the tree
representation. Furthermore, a tree-cleaning opera-
tion was considered to simplify the tree representation
by eliminating redundant operators. In the remainder
of this paper, we refer to this function generator as
random function generator (RFG).
In fact, functions generated by the RFG in-
deed have landscape characteristics different from the
BBOB test suite and complement the coverage of
BBOB functions in the instance space (
ˇ
Skvorc et al.,
2021b). Furthermore, it has been shown that, as far as
landscape characteristics are concerned, some func-
tions generated by the RFG belong to the same prob-
lem class as several automotive crashworthiness opti-
mization problem instances (Long et al., 2022).
In addition to GP and RFG approaches, a recent
paper proposed to make use of affine combinations of
BBOB functions and showed that these new functions
can help fill empty spots in the instance space (Diet-
rich and Mersmann, 2022). Essentially, a new func-
tion is constructed via a convex combination of two
selected BBOB functions, using a weighting factor
to control the interpolation. Extensions of this work
have generalized the approach to affine combinations
of more functions (not limited to only two functions)
and shown their potential for the analysis of auto-
mated algorithm selection methods (Vermetten et al.,
2023b; Vermetten et al., 2023a).
3 METHODOLOGY
In brief, we develop our GP-based function generator
(we simply refer this as GP in the remainder of this
paper) based on the RFG and canonical GP approach.
Precisely, we consider the mathematical operands and
operators similar to those used in the RFG with slight
modifications, as summarized in Table 1. Follow-
ing this, the GP search space consists of the terminal
space S (operands) and function space F (operators),
i.e., T = S F . Unlike typical GP-based SR method,
where each design variable x
i
is separately treated (as
terminal), we consider a tree-based math expression,
that is, a vector-based input t = (t
1
,··· ,t
d
), to facili-
tate a comparison with the RFG (Azzali et al., 2019).
Regarding the GP aspect, we consider the canoni-
cal GP and the distributed evolutionary algorithms in
Python (DEAP) package (Fortin et al., 2012). The de-
scriptions of our GP function generator are as follows.
Data. A set DoE samples X and the ELA features
of the target functions are used as input for the GP
pipeline.
Objective Function. In the GP-system, our opti-
mization target is to minimize the differences be-
tween the ELA features of an individual and the tar-
get function. Before the ELA feature computation,
we normalize the objective values (by min-max scal-
ing) to remove inherent bias as proposed in (Prager
and Trautmann, 2023a). Furthermore, we normalize
the ELA features (by min-max scaling) before the dis-
tance computation, to ensure that all ELA features
are within a similar scale range. For this, we con-
sider the minimum and maximum values from a set
of BBOB functions (24 BBOB functions, 5 instances
each). Based on the same set of BBOB functions, we
identify and filter out ELA features that are highly
correlated in a similar fashion to the work in (Long
et al., 2022), resulting in a total of 27 remaining fea-
tures.
Infeasible Solutions. An individual is considered
to be infeasible, if any of the four following condi-
tions is fulfilled:
1. Error when converting the tree representation to
an executable Python expression,
2. Bad objective values, e.g., infinity, missing or sin-
gle constant value,
3. Error in ELA computation, e.g., due to equal fit-
ness in all samples, and
4. Invalid distance, caused by missing value in ELA
feature.
All infeasible trees are penalized with a large fitness
of 10000.
Initialize Population. In the first generation, we
initialize the initial population (of a population size
of 50 for computational reasons) using random sam-
pling, with the tree depth is limited between 3 and
12. Similar to the RFG, we assigned each operand
and operator a probability of being selected (Table 1).
ECTA 2023 - 15th International Conference on Evolutionary Computation Theory and Applications
122
Table 1: List of notations and their meaning, syntax, protection rules (if any) and probability of being selected for the GP
sampling (only during the first generation).
Notation Meaning Syntax Remark/Protection Probability
Operands (S )
x Decision vector (x
1
,.. .,x
d
) 0.6250
a A real constant a U(1,10) 0.3125
rand A random number rand U(1,1.1) 0.0625
Operators (F )
add Addition a + x 0.1655
sub Subtraction a x 0.1655
mul Multiplication a · x 0.1098
div Division a/x Return 1, if
|
x
|
10
20
0.1098
neg Negative x 0.0219
rec Reciproval 1/x Return 1, if
|
x
|
10
20
0.0219
multen Multiplying by ten 10x 0.0219
square Square x
2
0.0549
sqrt Square root
p
|
x
|
0.0549
abs Absolute value
|
x
|
0.0219
exp Exponent e
x
0.0219
log Logarithm ln
|
x
|
Return 1, if
|
x
|
10
20
0.0329
sin Sine sin(2πx) 0.0329
cos Cosine cos(2πx) 0.0329
round Rounded value
d
x
e
0.0329
sum Sum of vector
d
i=1
x
i
0.0329
mean Mean of vector
1
d
d
i=1
x
i
0.0329
cum Cumulative sum of vector (
1
i=1
x
i
,.. .,
d
i=1
x
i
) 0.0109
prod Product of vector
d
i=1
x
i
0.0109
max Maximum value of vector max
i=1,...,d
x
i
0.0109
Whenever an infeasible tree is generated, we will do
re-sampling, meaning that this infeasible tree will be
replaced by generating a new tree. As such, we ensure
that the initial population is free from infeasible trees
(due to errors 1 and 2).
Mating Selection and Variation. We consider the
tournament selection with tournaments of size 5, sub-
tree crossover with a crossover probability of 0.5, and
subtree mutation with a mutation probability of 0.1.
Other remaining hyperparameters are set to default
settings. While hyperparameter tuning could poten-
tially improve our results further, we decide to leave
it for future work.
Result. An optimal individual as solution of the GP
runs.
4 EXPERIMENTAL SETUP
In this work, we test our pipeline based on all 24
BBOB functions (one by one) of three different di-
mensionalities d = 2, 5 and 10 (or simply 2d, 5d and
10d).
We consider a DoE size of 150d samples and the
search domain [5,5]
d
.
To reliably capture the landscape characteristics,
we compute ELA features in a bootstrapping manner
(using only 80% of the DoE samples and 5 repetitions
with different random seeds). As for the optimization
objective or individual fitness in the GP system, we
consider minimizing the average Wasserstein distance
between the ELA features of the first five BBOB in-
stances and the evaluated individual (5 bootstrapped
samples for each feature, for each instance). Due to
computational limits, we perform only one run for
each target function in each dimensionality.
Reproducibility. The codes (used to generate, pro-
cess and visualize the experiments), raw data and
more figures have been made available in a Zenodo
repository (Long et al., 2023).
Challenges of ELA-Guided Function Evolution Using Genetic Programming
123
0 200 400 600 800 1000 1200 1400
Fitness evaluations
0.0
0.1
0.2
0.3
0.4
0.5
Wasserstein distance
Figure 1: GP convergence for target F1 (sphere), in 2d, with
the fitness evalutions on the x-axis and the Wasserstein dis-
tance on the y-axis.
5 RESULTS
5.1 Performance of GP
We start by analysing the functions generated during
the GP runs with a 2d BBOB function as their target.
In Figure 1, we show the convergence trajectory of
a single run on F1. From this figure, we clearly see
that GP manages to improve over the initial popula-
tion (first 50 evaluations), as time goes on. Note that,
while it has a budget of 50 generations of 50 individ-
uals each, the combination of a crossover rate of 0.5
with a mutation rate of 0.1 means that there is a prob-
ability of 0.45 that a selected individual is not modi-
fied in any way, and just copied to the next generation
without being evaluated. Additionally, Figure 1 does
not include the infeasible solutions, which make up
2% of all evaluations in this run.
To give some context to the fitness values shown in
Figure 1, we compare functions generated via our GP
approach with functions generated by the RFG (see
Section 2.5). For this purpose, we generate 1000 fea-
sible functions (using the RFG) for each dimension-
ality and measure their Wasserstein distance to each
of the 24 target BBOB functions. Then, we com-
pare these distances to those of the functions gener-
ated during our GP runs ( 1400 functions) to the
corresponding target problem. This is visualized in
Figure 2, from which we see that the lower end (i.e.,
generated functions with low distance to target func-
tions) of the GP distribution is almost always better
than that of the RFG, with some exceptions, e.g., F12
(Bent Cigar) in 10d.
For these GP runs, we can also visualize the re-
sulting function landscapes to identify how much they
resemble the target BBOB function. This is shown in
Figure 3 for F1 (sphere) in 2d, where the 5 BBOB in-
stances are plotted in the first row, followed by 45 GP-
generated functions. These functions are selected by
sorting their fitness values and taking a linear spacing
in the rank values between the best and worst func-
tions, to show a range of generated functions of vary-
ing quality. From visual inspection, we notice that
even the best functions (with the smallest Wasserstein
distance) do not quite visually represent a sphere as
we might have expected.
In a similar fashion, we can create the same visu-
alization for other functions, e.g., F5 (linear slope), as
shown in Figure 4, where we observe a much closer
matching between the target and generated functions.
It is, however, interesting to note that some gener-
ated functions, e.g., row 7 column 5 (which repre-
sents function
x
0
+x
1
2
), appear visually similar to the
target, but nonetheless have a relatively large fitness
value. This raises the question of whether the dis-
tance to the target distribution in ELA space (using
the Wasserstein distance) really captures the intuitive
global properties of the linear slope problem.
5.2 Investigating the ELA Space
To understand why the distance between this lin-
ear slope and the target function is relatively large,
we need to look at the individual ELA features.
This can be done through a parallel coordinate plot,
as shown in Figure 5. From this figure, we can
see that mostly the ela meta.lin simple.coef.min and
ela meta.lin simple.coef.max are different between
the target function and generated function
x
0
+x
1
2
, in-
dicating that the steepness of the function might be
different. However, for the linear slope function, this
should not have a large impact as the global properties
are mostly preserved. This shows that different ELA
features are crucial for different types of target func-
tions. One direction to mitigate such problems might
be to analyse the spread of ELA feature values over
a large set of instances for each BBOB function (see
discussion around Figure 7 below).
To better understand the similarities between
functions, we visualize the ELA space filled by the
newly GP-generated functions relative to the existing
BBOB problems, by utilizing the uniform manifold
approximation mapping (UMAP) (McInnes et al.,
2018) method. For this, we first create the map-
ping using only the feature representations from the
24 BBOB problems (all five bootstrapped repetitions
on each instance). Next, we apply this fixed map on
the GP-generated functions (from one run of the GP).
The resulting plot for the target function F5 (linear
slope) is shown in Figure 6, where we see that most
GP-generated functions are indeed clustered together
around the target.
5.3 Distances in Feature Space
Our suspicion for why the distances in ELA space
do not directly seem to correlate to our visual un-
ECTA 2023 - 15th International Conference on Evolutionary Computation Theory and Applications
124
fid
0.0
0.1
0.2
0.3
0.4
Wasserstein Distance
fid
0.0
0.1
0.2
0.3
0.4
Wasserstein Distance
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Function ID
0.0
0.1
0.2
0.3
0.4
Wasserstein Distance
RFG
GP
Figure 2: Distribution of fitness values (Wasserstein distance) of the set of functions from the RFG and the GP runs with
the specified BBOB target functions (horizontal axis). Rows correspond to dimensionalities: 2d, 5d and 10d (from top to
bottom).
derstanding of high-level properties might be that all
ELA features are weighted equally. This means that
even features, which are more sensitive to small de-
viations, can have the same impact as features that
might be considered crucial to characterize a function
e.g., a linear slope. To gain insight into which features
might be more important for a given function, we an-
alyze the relative standard deviation of each ELA fea-
ture within instances of the same BBOB function and
visualized the results in Figure 7.
While Figure 7 shows us the variance of each
ELA feature within the target functions, it is im-
portant to relate this to the deviations observed in
the GP-generated functions. Following this, we look
at the average standard deviation of each ELA fea-
ture across all functions sampled during a single GP
run and compute the absolute difference to the val-
ues seen on the corresponding BBOB function tar-
get. The resulting differences are shown in Fig-
ure 8, where we can see that some features, such
as ela meta.lin simple.coef.max by min, are signifi-
cantly more variable in the GP-generated functions
than in BBOB. This indicates that these ELA fea-
tures might be very important for the distance mea-
surement.
In our GP experiments, we consider the average
Wasserstein distance to determine the fitness value.
This choice was made, since each ELA feature can
be considered as a random variable, for which a sta-
tistical distance measure would be appropriate. Al-
ternatively, we could make use of a regular distance
measure, based on the mean values of these ELA
feature distributions. To determine the impact the
choice of distance measure might have, we consider
the Kendall-tau correlation between a set of six met-
rics on the pairwise distances between the BBOB in-
stances, consisting of the Canberra, cosine, correla-
tion, Euclidean, cityblock and Wasserstein distance,
as visualized in Figure 9.
From Figure 9, we can see that the correlations,
while clearly positive, are not perfect. This is espe-
cially the case when comparing the statistical distance
(Wasserstein) against the vector-based distances. To
gauge which distance metric might be preferable, we
then compare the distances between instances of the
same function to the distances between instances of
different functions, as is done in Figure 10. Based on
this comparison, we notice that the Wasserstein dis-
tance surprisingly has the lowest distinguishing power
(unlike our initial intuition), while the cosine and
correlation distances show a clear trend of assigning
lower distances to same-function instances.
To further identify potential ways to modify
the distance measures, we compare the differences
in individual ELA features between same-function
and different-function instances. From the results
shown in Figure 11, we see that some features, e.g.,
ela meta.quad simple.cond, show very limited differ-
Challenges of ELA-Guided Function Evolution Using Genetic Programming
125
−4
0
4
−4
0
4
0.12 0.12 0.12 0.12 0.12
−4
0
4
0.12 0.12 0.13 0.13 0.13
−4
0
4
0.13 0.13 0.13 0.14 0.14
−4
0
4
0.14 0.14 0.15 0.15 0.15
−4
0
4
0.16 0.16 0.16 0.16 0.17
−4
0
4
0.17 0.17 0.17 0.18 0.19
−4
0
4
0.19 0.20 0.20 0.21 0.21
−4
0
4
0.22 0.23 0.24 0.25 0.26
−4 0 4
−4
0
4
0.27
−4 0 4
0.28
−4 0 4
0.30
−4 0 4
0.32
−4 0 4
0.34
Figure 3: Grid of functions generated by the GP procedure with target function F1 (sphere) in 2d, of which 5 instances are
plotted in the first row. The rows below are GP-generated problems selected by ranking their Wasserstein distance to the target
feature vector (indicated by the value in each subfigure) and taking a linear spacing in this ranking from the best (top left) to
the worst (bottom right).
ences when comparing instances of the same function
relative to different function instances. As such, it is
likely that these features contribute very little to any
distance measure, and might be potentially removed
to improve the stability, as reducing the vector dimen-
sionality can make the distances more reliable.
6 CONCLUSIONS AND FUTURE
WORK
In this paper, we have shown that GP can be guided
by ELA features to find problems with similar high-
level characteristics as a set of target problems. How-
ECTA 2023 - 15th International Conference on Evolutionary Computation Theory and Applications
126
−4
0
4
−4
0
4
0.16 0.16 0.16 0.16 0.16
−4
0
4
0.16 0.16 0.16 0.16 0.16
−4
0
4
0.16 0.16 0.16 0.16 0.16
−4
0
4
0.17 0.17 0.17 0.17 0.17
−4
0
4
0.17 0.17 0.18 0.18 0.18
−4
0
4
0.18 0.18 0.18 0.18 0.19
−4
0
4
0.19 0.19 0.19 0.20 0.21
−4
0
4
0.21 0.24 0.26 0.27 0.32
−4 0 4
−4
0
4
0.33
−4 0 4
0.37
−4 0 4
0.39
−4 0 4
0.47
−4 0 4
0.52
Figure 4: Grid of functions generated by the GP procedure with target function F5 (linear slope) in 2d, of which 5 instances
are plotted in the first row. The rows below are GP-generated problems selected by ranking their Wasserstein distance to the
target feature vector (indicated by the value in each subfigure) and taking a linear spacing in this ranking from the best (top
left) to the worst (bottom right).
ever, through this process, we highlight several po-
tential pitfalls with this approach, illustrated by the
fact that we could not accurately recreate a simple
sphere problem. Although our results are based on a
very limited set of experiments, they reveal that equal
weighting of all landscape features on the distance
measure leads to difficulties in focusing on the more
visual high-level features.
By comparing the differences in ELA features on
the BBOB problems both between instances of the
same function and instances of different functions, we
show that a feature selection mechanism should be in-
tegrated to make the fitness values more stable. Ad-
ditionally, a weighting scheme based on feature im-
Challenges of ELA-Guided Function Evolution Using Genetic Programming
127
disp.diff_mean_02
disp.diff_mean_05
disp.diff_median_05
disp.ratio_mean_10
disp.ratio_mean_25
disp.ratio_median_02
disp.ratio_median_10
ela_distr.kurtosis
ela_distr.number_of_peaks
ela_level.lda_qda_10
ela_level.mmce_lda_10
ela_level.mmce_lda_25
ela_level.mmce_qda_10
ela_meta.lin_simple.coef.max
ela_meta.lin_simple.coef.max_by_min
ela_meta.lin_simple.coef.min
ela_meta.lin_simple.intercept
ela_meta.lin_w_interact.adj_r2
ela_meta.quad_simple.adj_r2
ela_meta.quad_simple.cond
ela_meta.quad_w_interact.adj_r2
ic.h_max
ic.m0
nbc.dist_ratio.coeff_var
nbc.nn_nb.cor
nbc.nn_nb.mean_ratio
nbc.nn_nb.sd_ratio
−0.2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Figure 5: Parallel coordinate plot of the ELA features (one line for each bootstrapped DoE) for the GP-generated functions
and target function F5 (linear slope) in 2d, of which 5 instances are plotted in blue. The orange lines highlight an example of
GP-generated function, corresponding to
x
0
+x
1
2
, which has a Wasserstein distance of 0.19 to the target.
−10 −5 0 5 10 15
x0
−10
−5
0
5
10
x1
dist
5.0
7.5
10.0
12.5
15.0
kind
GP
Figure 6: UMAP projection of the GP-generated functions
with the target function F5 (linear slope) in 2d ELA space.
The mapping is based on BBOB instances only, which are
highlighted in blue crosses. The target F5 is highlighted in
red, with the mean feature vector across the five instances
indicated as a red triangle. The dots correspond to the
GP-generated problems, where the colour is the cityblock
or Manhattan distance to the target vector (here, coloring
based on Wasserstein distance is challenging).
disp.diff_mean_02
disp.diff_mean_05
disp.diff_median_05
disp.ratio_mean_10
disp.ratio_mean_25
disp.ratio_median_02
disp.ratio_median_10
ela_distr.kurtosis
ela_distr.number_of_peaks
ela_level.lda_qda_10
ela_level.mmce_lda_10
ela_level.mmce_lda_25
ela_level.mmce_qda_10
ela_meta.lin_simple.coef.max
ela_meta.lin_simple.coef.max_by_min
ela_meta.lin_simple.coef.min
ela_meta.lin_simple.intercept
ela_meta.lin_w_interact.adj_r2
ela_meta.quad_simple.adj_r2
ela_meta.quad_simple.cond
ela_meta.quad_w_interact.adj_r2
ic.h_max
ic.m0
nbc.dist_ratio.coeff_var
nbc.nn_nb.cor
nbc.nn_nb.mean_ratio
nbc.nn_nb.sd_ratio
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Function ID
0.0
0.2
0.4
0.6
0.8
Figure 7: Relative standard deviation (of the normalized
values) of each ELA feature for each BBOB function.
Lighter color represents a larger deviation.
portances might be used to, in combination with a
distance metric, more rigorously guide the GP search
disp.diff_mean_02
disp.diff_mean_05
disp.diff_median_05
disp.ratio_mean_10
disp.ratio_mean_25
disp.ratio_median_02
disp.ratio_median_10
ela_distr.kurtosis
ela_distr.number_of_peaks
ela_level.lda_qda_10
ela_level.mmce_lda_10
ela_level.mmce_lda_25
ela_level.mmce_qda_10
ela_meta.lin_simple.coef.max
ela_meta.lin_simple.coef.max_by_min
ela_meta.lin_simple.coef.min
ela_meta.lin_simple.intercept
ela_meta.lin_w_interact.adj_r2
ela_meta.quad_simple.adj_r2
ela_meta.quad_simple.cond
ela_meta.quad_w_interact.adj_r2
ic.h_max
ic.m0
nbc.dist_ratio.coeff_var
nbc.nn_nb.cor
nbc.nn_nb.mean_ratio
nbc.nn_nb.sd_ratio
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Function ID
0.0
0.2
0.4
0.6
0.8
1.0
Figure 8: Absolute difference in relative standard deviation
(of the normalized values) of each ELA feature for each
BBOB function to the functions generated by the GP run
with the respective target. Lighter color represents a larger
deviation.
canberra
cosine
correlation
euclidean
cityblock
wasserstein
canberra
cosine
correlation
euclidean
cityblock
wasserstein
1 0.71 0.64 0.68 0.71 0.42
0.71 1 0.88 0.88 0.86 0.38
0.64 0.88 1 0.88 0.87 0.35
0.68 0.88 0.88 1 0.91 0.41
0.71 0.86 0.87 0.91 1 0.42
0.42 0.38 0.35 0.41 0.42 1
−1.00
−0.75
−0.50
−0.25
0.00
0.25
0.50
0.75
1.00
Figure 9: Kendall-tau correlation between six distance mea-
sures on the BBOB instances.
towards relevant function characteristics. Further re-
search into ELA and other feature-free approaches
are also important to improve the used approach, e.g.,
models such as DoE2Vec (van Stein et al., 2023) are
a starting point in this direction.
Another aspect that should be considered is the
specific setting of the GP itself. In this work, we
ECTA 2023 - 15th International Conference on Evolutionary Computation Theory and Applications
128
canberra cosine correlation euclidean cityblock wasserstein
Metric
0.0
0.2
0.4
0.6
0.8
1.0
Normalized Distance
version
inner
outer
Figure 10: Distribution of normalized distances between
BBOB instances of the same problem (inner) and instances
of different problems (outer).
disp.diff_mean_02
disp.diff_mean_05
disp.diff_median_05
disp.ratio_mean_10
disp.ratio_mean_25
disp.ratio_median_02
disp.ratio_median_10
ela_distr.kurtosis
ela_distr.number_of_peaks
ela_level.lda_qda_10
ela_level.mmce_lda_10
ela_level.mmce_lda_25
ela_level.mmce_qda_10
ela_meta.lin_simple.coef.max
ela_meta.lin_simple.coef.max_by_min
ela_meta.lin_simple.coef.min
ela_meta.lin_simple.intercept
ela_meta.lin_w_interact.adj_r2
ela_meta.quad_simple.adj_r2
ela_meta.quad_simple.cond
ela_meta.quad_w_interact.adj_r2
ic.h_max
ic.m0
nbc.dist_ratio.coeff_var
nbc.nn_nb.cor
nbc.nn_nb.mean_ratio
nbc.nn_nb.sd_ratio
0.0
0.1
0.2
0.3
0.4
Relative difference
version
inner
outer
Figure 11: Relative difference (between normalized values)
of each of the used ELA features, between BBOB instances
of the same problem (inner) and instances of different prob-
lems (outer).
made use of default hyperparameter settings with a
relatively small population size for computational rea-
sons. This might, however, limit the ability of GP to
find diverse solutions, leading to premature conver-
gence (Schweim et al., 2021). Care should also be
taken to include a more rigorous tree-cleaning opera-
tion, similar to (Tian et al., 2020), to simplify the re-
sulting expressions and prevent infeasible trees from
being generated.
In the future, the generation of functions with spe-
cific landscape properties has significant potential to
help address real-world problems. By generating a di-
verse set of problems with similar features to a com-
plex target problem, we can create a test set for al-
gorithm selection and configuration pipelines. This
has clear benefits over training on a conventional sur-
rogate, since we can avoid overfitting by considering
diversified sets of problems.
ACKNOWLEDGEMENTS
The contribution of this paper was written as part
of the joint project newAIDE under the consortium
leadership of BMW AG with the partners Altair En-
gineering GmbH, divis intelligent solutions GmbH,
MSC Software GmbH, Technical University of Mu-
nich, TWT GmbH. The project is supported by the
Federal Ministry for Economic Affairs and Climate
Action (BMWK) on the basis of a decision by the
German Bundestag.
This work was performed using the ALICE com-
pute resources provided by Leiden University.
REFERENCES
Azzali, I., Vanneschi, L., Silva, S., Bakurov, I., and Gia-
cobini, M. (2019). A vectorial approach to genetic
programming. In European Conference on Genetic
Programming, pages 213–227. Springer.
Bartz-Beielstein, T., Doerr, C., Berg, D. v. d., Bossek,
J., Chandrasekaran, S., Eftimov, T., Fischbach, A.,
Kerschke, P., La Cava, W., Lopez-Ibanez, M., et al.
(2020). Benchmarking in optimization: Best practice
and open issues. arXiv preprint arXiv:2007.03488.
Dietrich, K. and Mersmann, O. (2022). Increasing the diver-
sity of benchmark function sets through affine recom-
bination. In Parallel Problem Solving from Nature–
PPSN XVII: 17th International Conference, PPSN
2022, Dortmund, Germany, September 10–14, 2022,
Proceedings, Part I, pages 590–602. Springer.
Doerr, C., Wang, H., Ye, F., van Rijn, S., and B
¨
ack, T.
(2018). IOHprofiler: A Benchmarking and Profiling
Tool for Iterative Optimization Heuristics. arXiv e-
prints:1810.05281.
Fortin, F.-A., De Rainville, F.-M., Gardner, M.-A., Parizeau,
M., and Gagn
´
e, C. (2012). DEAP: Evolutionary algo-
rithms made easy. Journal of Machine Learning Re-
search, 13:2171–2175.
Hansen, N., Auger, A., Ros, R., Finck, S., and Po
ˇ
s
´
ık, P.
(2010). Comparing results of 31 algorithms from
the black-box optimization benchmarking bbob-2009.
In Proceedings of the 12th Annual Conference Com-
panion on Genetic and Evolutionary Computation,
GECCO ’10, page 1689–1696. ACM.
Hansen, N., Auger, A., Ros, R., Mersmann, O., Tu
ˇ
sar, T.,
and Brockhoff, D. (2021). Coco: a platform for com-
paring continuous optimizers in a black-box setting.
Optimization Methods and Software, 36(1):114–144.
Hansen, N., Finck, S., Ros, R., and Auger, A. (2009).
Real-Parameter Black-Box Optimization Benchmark-
ing 2009: Noiseless Functions Definitions. Research
Report RR-6829, INRIA.
Kerschke, P., Hoos, H. H., Neumann, F., and Trautmann, H.
(2019). Automated algorithm selection: Survey and
perspectives. Evolutionary computation, 27(1):3–45.
Kerschke, P. and Trautmann, H. (2019a). Automated
algorithm selection on continuous black-box prob-
lems by combining exploratory landscape analysis
and machine learning. Evolutionary Computation,
27(1):99–127.
Kerschke, P. and Trautmann, H. (2019b). Comprehen-
sive Feature-Based Landscape Analysis of Continuous
Challenges of ELA-Guided Function Evolution Using Genetic Programming
129
and Constrained Optimization Problems Using the R-
Package Flacco, pages 93–123. Studies in Classifi-
cation, Data Analysis, and Knowledge Organization.
Springer International Publishing.
Koza, J. R. (1989). Hierarchical genetic algorithms oper-
ating on populations of computer programs. In Srid-
haran, N. S., editor, Proceedings of the Eleventh In-
ternational Joint Conference on Artificial Intelligence
IJCAI-89, volume 1, pages 768–774. Morgan Kauf-
mann.
Long, F. X., van Stein, B., Frenzel, M., Krause, P., Gitterle,
M., and B
¨
ack, T. (2022). Learning the characteris-
tics of engineering optimization problems with appli-
cations in automotive crash. In Proceedings of the
Genetic and Evolutionary Computation Conference,
pages 1227–1236.
Long, F. X., Vermetten, D., Kononova, A. V., Kalkreuth,
R., Yang, K., B
¨
ack, T., and van Stein, N. (2023).
Reproducibility files and additional figures. https:
//doi.org/10.5281/zenodo.7896138.
McInnes, L., Healy, J., Saul, N., and Grossberger, L. (2018).
Umap: Uniform manifold approximation and projec-
tion. The Journal of Open Source Software, 3(29):861.
Mersmann, O., Bischl, B., Trautmann, H., Preuss, M.,
Weihs, C., and Rudolph, G. (2011). Exploratory land-
scape analysis. In Proceedings of the 13th Annual
Conference on Genetic and Evolutionary Computa-
tion, GECCO ’11, page 829–836. ACM.
Mersmann, O., Preuss, M., and Trautmann, H. (2010).
Benchmarking evolutionary algorithms: Towards ex-
ploratory landscape analysis. In Schaefer, R., Cotta,
C., Kołodziej, J., and Rudolph, G., editors, Parallel
Problem Solving from Nature, PPSN XI, pages 73–82.
Springer Berlin Heidelberg.
Mu
˜
noz, M. A., Kirley, M., and Smith-Miles, K. (2022).
Analyzing randomness effects on the reliability of
exploratory landscape analysis. Natural Computing,
21(2):131–154.
Mu
˜
noz, M. A. and Smith-Miles, K. (2020). Generating new
space-filling test instances for continuous black-box
optimization. Evolutionary computation, 28(3):379–
404.
Mu
˜
noz, M. A., Sun, Y., Kirley, M., and Halgamuge, S. K.
(2015). Algorithm selection for black-box continu-
ous optimization problems: A survey on methods and
challenges. Information Sciences, 317:224–245.
Prager, R. P. and Trautmann, H. (2023a). Nullifying the
inherent bias of non-invariant exploratory landscape
analysis features. In Applications of Evolutionary
Computation: 26th European Conference, EvoAppli-
cations 2023, Held as Part of EvoStar 2023, Brno,
Czech Republic, April 12–14, 2023, Proceedings,
pages 411–425. Springer.
Prager, R. P. and Trautmann, H. (2023b). Pflacco: Feature-
Based Landscape Analysis of Continuous and Con-
strained Optimization Problems in Python. Evolution-
ary Computation, pages 1–25.
Renau, Q., Doerr, C., Dreo, J., and Doerr, B. (2020). Ex-
ploratory landscape analysis is strongly sensitive to
the sampling strategy. In B
¨
ack, T., Preuss, M., Deutz,
A., Wang, H., Doerr, C., Emmerich, M., and Traut-
mann, H., editors, Parallel Problem Solving from Na-
ture PPSN XVI, pages 139–153. Springer Interna-
tional Publishing.
Renau, Q., Dr
´
eo, J., Doerr, C., and Doerr, B. (2021).
Towards explainable exploratory landscape analysis:
extreme feature selection for classifying bbob func-
tions. In Applications of Evolutionary Computa-
tion: 24th International Conference, EvoApplica-
tions 2021, Held as Part of EvoStar 2021, Virtual
Event, April 7–9, 2021, Proceedings 24, pages 17–33.
Springer.
Rice, J. R. (1976). The algorithm selection problem. vol-
ume 15 of Advances in Computers, pages 65–118. El-
sevier.
Schweim, D., Wittenberg, D., and Rothlauf, F. (2021). On
sampling error in genetic programming. Natural Com-
puting, pages 1–14.
Smith-Miles, K. and Mu
˜
noz, M. A. (2023). Instance space
analysis for algorithm testing: Methodology and soft-
ware tools. ACM Computing Surveys, 55(12):1–31.
Sobol’, I. M. (1967). On the distribution of points in
a cube and the approximate evaluation of integrals.
USSR Computational Mathematics and Mathematical
Physics, 7(4):86–112.
Tackett, W. A. (1995). Mining the genetic program. IEEE
Expert, 10(3):28–38.
Thomaser, A., Vogt, M.-E., Kononova, A. V., and B
¨
ack,
T. (2023). Transfer of multi-objectively tuned cma-es
parameters to a vehicle dynamics problem. In Evo-
lutionary Multi-Criterion Optimization: 12th Interna-
tional Conference, EMO 2023, Leiden, The Nether-
lands, March 20–24, 2023, Proceedings, pages 546–
560. Springer.
Tian, Y., Peng, S., Zhang, X., Rodemann, T., Tan, K. C., and
Jin, Y. (2020). A recommender system for metaheuris-
tic algorithms for continuous optimization based on
deep recurrent neural networks. IEEE Transactions
on Artificial Intelligence, 1(1):5–18.
van Stein, B., Long, F. X., Frenzel, M., Krause, P., Gitterle,
M., and B
¨
ack, T. (2023). Doe2vec: Deep-learning
based features for exploratory landscape analysis.
Vermetten, D., Ye, F., B
¨
ack, T., and Doerr, C. (2023a). MA-
BBOB: Many-affine combinations of BBOB func-
tions for evaluating AutoML approaches in noiseless
numerical black-box optimization contexts. AutoML
2023.
Vermetten, D., Ye, F., and Doerr, C. (2023b). Using affine
combinations of BBOB problems for performance as-
sessment. CoRR, abs/2303.04573.
ˇ
Skvorc, U., Eftimov, T., and Koro
ˇ
sec, P. (2021a). The ef-
fect of sampling methods on the invariance to func-
tion transformations when using exploratory land-
scape analysis. In 2021 IEEE Congress on Evolution-
ary Computation (CEC), pages 1139–1146.
ˇ
Skvorc, U., Eftimov, T., and Koro
ˇ
sec, P. (2021b). A
Complementarity Analysis of the COCO Benchmark
Problems and Artificially Generated Problems, page
215–216. ACM.
ECTA 2023 - 15th International Conference on Evolutionary Computation Theory and Applications
130