Fitness Histograms of Expert-Deﬁned Problem Classes in Fitness

Landscape Classiﬁcation

Vojt

ech Uher

and Pavel Kr

omer

Department of Computer Science, VSB – Technical University of Ostrava, Ostrava-Poruba, Czech Republic

{vojtech.uher, pavel.kromer}@vsb.cz

Keywords:

Exploratory Landscape Analysis, Sampling Strategies, Low-Discrepancy Sequences, Fitness Histogram,

Multiclass Classiﬁcation.

Abstract:

Various metaheuristic algorithms can be employed to ﬁnd optimal or sub-optimal solutions for different prob-

lems. A ﬁtness landscape (FL) is an abstraction representing a speciﬁc optimization task. Exploratory land-

scape analysis (ELA) approximates the FL by estimating its features from a limited number of random solu-

tion samples. Such ELA features help in estimating the properties of the FL and ultimately aid the selection

of suitable optimization algorithms for problems with certain FL characteristics. This paper proposes using

a normalized histogram of ﬁtness values as a simple statistical feature vector for representing FLs. These

histograms are classiﬁed using various classiﬁers to evaluate their effectiveness in representing different prob-

lems. The study focuses on 24 single-objective benchmark problems, grouped into ﬁve expert-deﬁned classes.

The performance of several classiﬁers is compared across different problem dimensions and sample sizes, em-

phasizing the impact of different sampling strategies and the number of histogram bins. The ﬁndings highlight

the robustness of histogram representation and reveal promising experimental setups and relationships.

1 INTRODUCTION

Nature-inspired metaheuristic algorithms, such as

swarm, e.g., Particle Swarm Optimization (Eslami

et al., 2012), and evolutionary, e.g., Genetic Algo-

rithm (Katoch et al., 2021; Nowakova and Poko-

rny, 2014), and Differential Evolution (Das and Sug-

anthan, 2010) methods are effective approaches for

solving complex problems through optimization. The

nature of black-box optimization problems that are

most often tackled by bio-inspired metaheuristics is

usually unknown. At the same time, it is well

understood that different metaheuristics perform on

different types of problems differently (Mersmann

et al., 2011) and the selection of an efﬁcient algo-

rithm or algorithm parameters becomes an impor-

tant and challenging issue. An appropriate algo-

rithm well-suited for solving a speciﬁc problem can

signiﬁcantly enhance optimization performance and

conserve valuable resources by reducing the num-

ber of costly ﬁtness function evaluations (Malan,

2021; Lang and Engelbrecht, 2021; Zou et al., 2022).

Landscape analysis is a top-level data-driven pro-

https://orcid.org/0000-0002-7475-3625

https://orcid.org/0000-0001-8428-3332

cess that can provide at least limited insights into

general optimization problems, estimate their prop-

erties and characteristics, and entangle the relation-

ships between different types of problems and var-

ious optimization algorithms. It can be used for

many downstream tasks including automated algo-

rithm selection (Malan, 2021; Tanabe, 2022), param-

eter tuning (Pikalov and Mironovich, 2021), algo-

rithm performance prediction and explanation (Tra-

janov et al., 2022), problem classiﬁcation (Uher and

omer, 2023; Renau et al., 2021), etc.

A ﬁtness landscape (Richter and Engelbrecht,

2014) is an abstraction that represents an opti-

mization problem by a continuous multidimensional

function (hypersurface). However, a complete de-

scription of the problem’s FL would be equivalent

to solving it. Instead, practical landscape analy-

sis methods describe the FLs by carefully selected

features that summarize their important properties,

for example, ruggedness, deceptiveness, and multi-

modality (Mu

noz et al., 2015).

Exploratory landscape analysis (Mersmann et al.,

2011) is a popular problem-agnostic method for char-

acterizing FLs of optimization problems. On the top

level, it consists of a series of steps that enable an

effective and compressed characterization of the hy-

Uher, V. and Krömer, P.

Fitness Histograms of Expert-Deﬁned Problem Classes in Fitness Landscape Classiﬁcation.

DOI: 10.5220/0012923900003837

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 16th International Joint Conference on Computational Intelligence (IJCCI 2024), pages 205-213

ISBN: 978-989-758-721-4; ISSN: 2184-3236

205

persurfaces deﬁned by the ﬁtness and other interest-

ing values (e.g., constraint violation score) associ-

ated with problem solutions. Essentially, ELA takes

a ﬁnite set of sample problem solutions, computes

for them the characteristic values, and uses them to

compute numerical landscape features (Mersmann

et al., 2011; Zou et al., 2022). The outcomes of ELA

are affected by many parameters, including the em-

ployed sampling strategy, the type of evaluated land-

scape features, etc. The sampling strategies aim to

achieve good coverage of the search space, high regu-

larity, and low discrepancy of samples. Popular sam-

pling strategies include Uniform pseudo-random sam-

pling, quasi-random techniques such as Latin Hyper-

cube Sampling (LHS) (McKay et al., 2000; Kerschke

and Trautmann, 2019), and sampling based on low-

discrepancy sequences such as Sobol (Sobol, 1967)

and Halton sequence (Halton, 1964). A popular set of

landscape features is provided, e.g., in the FLACCO

library (Kerschke and Trautmann, 2019). Several

studies with single (Renau et al., 2021; Lang and

Engelbrecht, 2021) and bi-objective (Kr

omer et al.,

2022; Liefooghe et al., 2023; Kr

omer et al., 2024)

problems demonstrated that the values of landscape

features are signiﬁcantly affected by the sampling

strategy.

The evaluation of commonly used landscape fea-

tures often involves computationally expensive oper-

ations such as the computation of pairwise distances,

execution of several local searches, and building of lo-

cal optima networks (Kerschke and Trautmann, 2019;

Adair et al., 2019). This makes their use, in par-

ticular for large sets of samples, often inconvenient.

Besides robust and expensive types of landscape fea-

tures, more straightforward and lightweight FL char-

acterization approaches can be considered. Recently,

a simple FL representation based solely on the dis-

tribution of individual ﬁtness values has been investi-

gated (Uher and Kr

omer, 2023). The method repre-

sents the FL by a normalized histogram reﬂecting the

distribution of ﬁtness values calculated for the sam-

ples generated by a selected sampling strategy. The

study (Uher and Kr

omer, 2023) showed that the ﬁt-

ness histogram is, despite its simplicity, a sufﬁciently

distinctive landscape representation that enables the

detection of different types of FLs by cluster analy-

sis. It also examined the impact of the sampling strat-

egy and showed that the Uniform and optimized LHS

overcome the performance of the low-discrepancy se-

quences. However, the paper explored the histogram

feature only to a limited extent including ﬁxed set-

ting of 50 histogram bins performing simple clus-

tering analysis leading to relatively weak separabil-

ity (silhouette score ≤ 0.5). These initial results en-

couraged an additional, more detailed investigation of

the effects of different sampling strategies, histogram

parameters, and distance measures in the context of

problem representation by ﬁtness histograms.

This study summarizes the results of an exten-

sive computational investigation into the use of ﬁt-

ness histograms for problem characterization. Clas-

siﬁcation based on histograms is a technique known

in image retrieval as color indexing (Swain and Bal-

lard, 1991). In color indexing, each image is rep-

resented by a histogram of the color frequencies of

its pixels. Comparable distributions of colors indicate

similar images (Swain and Ballard, 1991; Barla et al.,

2003). To characterize black-box optimization prob-

lems, the set of ﬁtness values is interpreted as a statis-

tical random variable and its distribution (histogram

with a user-deﬁned number of bins) can be compared

with other random variables (Shirakawa and Nagao,

2016), in this case, the representations of other FLs

corresponding with other problems.

In this work, the ability of ﬁtness histograms to

capture the properties of different types of black-box

optimization problems under a wide variety of exper-

imental conﬁgurations is assessed. To do that, we

employ an expert-designed, well-structured, and care-

fully curated set of test problems from the COmparing

Continuous Optimizers (COCO) platform (Hansen

et al., 2021). The experiments are performed on the

24 BBOB single-objective test functions available in

COCO grouped into 5 expert-deﬁned classes. The

functions are ﬁrst represented by the ﬁtness his-

tograms obtained from solution samples of differ-

ent sizes obtained with the help of different sam-

pling strategies. Then, the Decision tree, Random

forest, and k-Nearest neighbors (kNN) (Renau et al.,

2021) classiﬁers in combination with standard Eu-

clidean and two statistical (histogram distance, KL-

divergence) distance measures (Uher and Kr

omer,

2023) are applied to learn the expert-deﬁned classes

of the test functions and classify unknown optimiza-

tion problems. The classiﬁcation process is used as

a veriﬁcation that the ﬁtness histogram covers sufﬁ-

cient information to distinguish benchmark problems

of different properties. The results show that different

combinations of sampling, distances, and classiﬁers

yield different abilities to represent problems and sig-

niﬁcantly extend the initial ﬁndings on these issues

from (Uher and Kr

omer, 2023).

The following Section 2 describes the ELA

pipeline and the methods used to characterize the test

problems by ﬁtness histograms. Section 3 provides

a detailed description of the experiments and a thor-

ough analysis of their results. Finally, major conclu-

sions are drawn and future work is outlined in 4.

ECTA 2024 - 16th International Conference on Evolutionary Computation Theory and Applications

206

2 FITNESS LANDSCAPE

CLASSIFICATION BY

HISTOGRAMS OF FITNESS

VALUES

Each FL corresponds to one single-objective test

function and is represented by a normalized histogram

of ﬁtness values computed for a set of randomly

selected sample solutions scattered over the search

space. The assumption is that similar functions, i.e.,

from the same class of problems, should be repre-

sented by similar ﬁtness histograms, and functions

from different problem classes should yield dissimi-

lar ﬁtness histograms. We ﬁrst describe the used test

suite, sampling strategies, the normalized histograms

of ﬁtness values, and summarize the employed classi-

ﬁcation methods.

2.1 Test Problems

Numerous benchmark suites are available for evalu-

ation purposes, as highlighted by Engelbrecht et al.

(Lang and Engelbrecht, 2021). We opted to utilize

the single-objective benchmark problems provided by

the COmparing Continuous Optimizers (COCO) plat-

form (Hansen et al., 2021). Speciﬁcally, this col-

lection encompasses 24 BBOB noiseless, scalable, and

single-objective test functions, each characterized by

unique ﬁtness landscapes. This selection serves as a

systematic framework for evaluating sampling strate-

gies and their associated histograms. Each function

within this set is accessible in various dimensions

(d ∈ {2,3,5,10,20,40}) and is represented by 15 in-

stances; however, our study focuses exclusively on

the ﬁrst instance. The COCO test suite provides an

expert-deﬁned classiﬁcation of the 24 BBOB test func-

tions into 5 groups: 1) Separable functions (f001-

f005), 2) Functions with low or moderate condition-

ing (f006-f009), 3) Functions with high conditioning

and unimodal (f010-f014), 4) Multi-modal functions

with adequate global structure (f015-f019), 5) Multi-

modal functions with weak global structure (f020-

f024). This classiﬁcation is used as a theoretical back-

ground for our experiments. The COCO platform is

acknowledged as a state-of-the-art publicly available

resource (Renau et al., 2021).

In order to scrutinize the continuous functions as-

sociated with the test problems, a discrete sample set

of size n is generated using a speciﬁed sampling strat-

egy, with the stipulation that n be a power of two,

i.e., n = 2

. This selection is particularly advanta-

geous for certain low-discrepancy sequences. In the

context of ELA, it is common that n is within the

range of [10

· d,...,10

· d] (Mu

noz et al., 2015).

The implementation of these methodologies relies on

the Python COCO library, alongside the SciPy and

Scikit-learn Python libraries (publicly available).

2.2 Problem Sampling

In ELA, the goal of problem (solution) sampling is to

select a ﬁnite set of problem solutions (sample) that

will represent the entire problem. The ﬁtness values

of the solution sample are computed and used to esti-

mate the characteristics of the ﬁtness landscape and,

consequently, the whole problem (Renau et al., 2020).

A signiﬁcantly biased set of samples can result in sys-

tematic information loss due to under- or oversam-

pling in speciﬁc regions. Various sampling methods

can be utilized to fulﬁll this overarching goal, aim-

ing to acquire data points and ﬁtness values that en-

able precise characterization of the underlying prob-

lem, emphasizing even coverage of the search space.

As each sampling strategy produces slightly differ-

ent samples and the landscape features computed on

their basis may also vary. In this work, we con-

sidered the effect of several popular sampling strate-

gies. Uniform Random sampling (Uniform) serves

as the baseline sampling method, generating solu-

tions for sampling through a pseudorandom genera-

tor with a uniform probability distribution. Latin

Hypercube Sampling (LHS) generates near-random

samples from multi-dimensional spaces. It divides

the space into a square grid, ensuring that only one

sample is drawn from each column and row (McKay

et al., 2000). The LHS Optimized (LHSO) sampling

used in the experiments is an optimized variant of

LHS that employs random coordinate permutations

to reduce centered discrepancy and enhance space–

ﬁlling robustness. Sobol sequence-based sampling

(Sobol) utilizes the Sobol low-discrepancy sequence,

a quasi-random sequence with a base of 2 that binary

represents the position on each dimension and is ef-

ﬁciently implemented through bit-vector operations

(Sobol, 1967). To enhance the discrepancy of the

sequence, a linear matrix scramble with digital ran-

dom shifting is applied. Halton sequence-based sam-

pling (G-Halton) builds on the Halton low-discrep-

ancy sequence, another quasi-random sequence using

coprime integers as its bases (Halton, 1964). It is a

generalization of the one-dimensional van der Corput

sequence (Chi et al., 2005). While performing well

in low dimensions, a correlation is observed in higher

dimensions that adversely affects the distribution.

Fitness Histograms of Expert-Deﬁned Problem Classes in Fitness Landscape Classiﬁcation

207

2.3 Normalized Fitness Histogram

A simple global feature aggregated from local ﬁtness

values is deﬁned (Uher and Kr

omer, 2023). Given

a ﬁtness function, f : R

→ R, and a set of discrete

samples, S = {s

,...,s

}, the set of ﬁtness values is

computed as V = {v;∀s ∈ S : v = f (s)}, where d is

the problem dimension and n is the number of sam-

ples. The set, V , is utilized to compute a histogram of

h bins within the range of values [min(V ), max(V )],

subject to

∑

j=1

/n = 1, where c

is the number of

ﬁtness values falling to the j-th bin. The normalized

histogram represents a discrete probability distribu-

tion of ﬁtness values c = {c

/n,...,c

/n}.

The histogram is inﬂuenced by landscape prop-

erties such as ruggedness, variance of ﬁtness values,

and multi-modality. The histogram bins, denoted as h,

provide a standardized length to feature vectors, facil-

itating easy comparisons. The value of h also governs

the precision of the captured distribution. A smaller h

results in greater compression of the contained infor-

mation. Normalizing histograms in the existing range

of values is important to obtain comparable represen-

tations of ﬁtness distributions. This feature is invari-

ant to translation, scaling, and rotation and does not

consider any local structure of a ﬁtness function.

2.4 Histogram Classification

A normalized histogram can be interpreted as a real-

valued feature vector of length h, and can be used

with standard classiﬁcation algorithms, represented

here by Decision tree (DT), Random forest (RF), and

k-Nearest neighbors (kNN) using the Euclidean dis-

tance. These methods are usable for the non-separable

distribution of classes (Renau et al., 2021), but they

do not reﬂect the statistical meaning of a histogram

that represents a discrete probability distribution of

ﬁtness values. It typically cannot take on arbitrary

values, as the sum of bins equals to 1. Therefore,

two statistical distance measures (histogram distance,

KL-divergence) are provided for usage with kNN for

comparison (Uher and Kr

omer, 2023).

Histogram distance (histDist) is a measure that

can express the degree of similarity of two histograms

as their intersection (Swain and Ballard, 1991; Barla

et al., 2003). Histogram intersection is deﬁned as a

sum of minimum values of corresponding bins of two

histograms, a and b, with the same number of bins,

h, histInt(a,b) =

∑

j=1

min(a

). The histograms

are equal when histInt(a,b) = 1, and the histogram

distance can be therefore deﬁned as histDist(a,b) =

1 − histInt(a, b).

Another way to evaluate the similarity of two

histograms is through Kullback–Leibler-divergence

(KL). For two normalized histograms, i.e. estimates

of probability density functions, a and b, the KL-

divergence (Kullback and Leibler, 1951) evaluates the

relative entropy from the ﬁrst probability distribution,

a, to the second, b, KL(a ∥ b) =

∑

j=1

log(a

Statistical divergence quantiﬁes the dissimilarity

between two probability distributions. These diver-

gences can be conceptualized as extensions of the

squared Euclidean distance (SED) (Cha and Srihari,

2002). To use it as a distance measure, its sym-

metrized variant is considered (Uher and Kr

omer,

2023): KLDiv(a,b) = (KL(a ∥ b) + KL(b ∥ a))/2.

3 EXPERIMENTS AND RESULTS

An initial cluster analysis of BBOB test functions

demonstrated that they can be well-separated using

the normalized histograms of sampled ﬁtness values

(Uher and Kr

omer, 2023). It also showed that the

sampling strategy can inﬂuence the representativeness

of ﬁtness histograms. These results are further ex-

panded here and problem classiﬁcation accuracy ob-

tained with ﬁtness histograms under different conﬁg-

urations is assessed in the context of multi-class clas-

siﬁcation. The considered problem classes are the

ﬁve expert-deﬁned classes from COCO, introduced in

section 2.1, each of which consists of 4-5 test func-

tions. We examine whether the lightweight ﬁtness

histogram-based representation provides sufﬁcient in-

formation to distinguish the functions belonging to

these 5 classes and investigate the inﬂuence of 1) sam-

pling strategies, 2) the number of histogram bins, h,

and 3) the classiﬁcation algorithms on this ability.

The experimental conﬁguration involves the as-

sessment of 24 BBOB COCO functions, speciﬁcally fo-

cusing on the ﬁrst function instance and search space

bounded by [−5,5]

. The study encompasses three

dimensions, d ∈ {5,10,20}, three distinct sample

sizes, n ∈ {2

}, and the use of ﬁve sampling

strategies (Uniform, Sobol, G-Halton, LHS, LHSO)

for the generation of solution samples. For each strat-

egy, dimension, and set size, 30 sample sets are ran-

domly generated. Subsequently, each set of samples

undergoes evaluation across the 24 test functions, re-

sulting in sets of corresponding ﬁtness values. To

further analyze the outcomes, normalized histograms

with varying bin counts, h ∈ {3,8,15,25,50}, are

constructed for each function and combination of ex-

perimental parameters. For ﬁtness landscape clas-

siﬁcation, three traditional classiﬁers, Decision tree

(DT), Random forest (RF), and k-Nearest neighbors

(kNN), are used. DT and RF are tested with the de-

ECTA 2024 - 16th International Conference on Evolutionary Computation Theory and Applications

208

fault parameters of Scikit-learn. The kNN is tested

for k = 7 with Euclidean distance and two statistical

dissimilarities (histogram distance, KL-divergence)

for comparison.

3.1 Fitness Landscape Classification

The ﬁtness histograms representing feature vectors of

distinct test functions should differ enough to be dis-

tinguished by classiﬁers. To create the training and

test sets for more robust performance analysis, the 30

random sample sets are evaluated, so that, each test

function is represented by 30 normalized ﬁtness his-

tograms. The sample sets are split into 15 training and

15 test sets. Therefore, 15 · 24 histograms are used to

build a classiﬁcation model and the same amount is

used to test it. In this case, 5-class classiﬁcation is per-

formed based on the expert-deﬁned groups of 24 BBOB

test functions (each group 4-5 functions) and the aver-

age accuracy is computed. This procedure is repeated

for each combination of classiﬁer, sampling strategy,

dimension d, set size n, and number of histogram bins

h. The conﬁgurations are never mixed across training

and test sets. Employing the kNN classiﬁer, we set

k = 7, a value approximately equivalent to half of the

training set, which demonstrated good results in con-

ducted experiments.

The results (average accuracies) of 5-class clas-

siﬁcation corresponding to the deﬁned test suite

and parameters have been computed to comprehen-

sively study classiﬁcation performance representing

the background for the following experiments and

conclusions. Obviously, the higher n strongly im-

proves the accuracy as the ﬁtness landscapes are ex-

plored in greater detail. The conﬁgurations reaching

the absolute best accuracies for all three dimensions

are summarized here:

d = 5 (acc. 1.000): h ∈ {25,50}, n = 2

, kNN

(all) , Uniform & LHS & LHSO

d = 10 (acc. 0.972): h = 8, n = 2

, kNN (Histogram

d.), LHSO

d = 20 (acc. 0.992): h = 8, n = 2

, kNN (Euclidean

d.), LHS

The basic question is how the histograms repre-

sent the underlying test functions and what is the in-

ﬂuence of the tested parameters. The best results

show accuracies over 97%. First, the impact of num-

ber of histogram bins h is examined. In order to clar-

ify the pattern more comprehensively, we compare

and rank the corresponding accuracies for identical

methods and parameters across resulting tables for

various values of h. Consequently, each value is as-

signed a rank ranging from 1 to 5. Table 1 presents

Table 1: Olympic medal ranking of histogram bins, h.

1st 2nd 3rd 4th 5th

h = 3 56 40 28 37 64

h = 8 102 69 23 25 6

h = 15 44 69 89 16 7

h = 25 34 25 66 93 7

h = 50 13 13 19 53 127

the Olympic medal ranking based on h, indicating the

total count of ﬁrst, second, and subsequent positions.

The table shows that h ∈ {8, 15} leads to greater av-

erage accuracy, and h = 8 seems to be the best one

in general. The h = 3 leads to unbalanced results

(some outstanding, some substandard), as the repre-

sentation of the ﬁtness values is too compressed. For

h ∈ {25,50}, the representation is overﬁtted.

One of the main tasks of this paper is to examine

the impact of applied sampling strategies. The cor-

responding Olympic medal ranking is summarized in

table 2. The ranks are assigned to sampling strate-

gies within the same h, n, d, and classiﬁer. The

LHS and LHSO strongly dominate in most cases.

The basic Uniform sampling leads to average results

while the Sobol and G-Halton samplings generally

occupy the last positions. This trend is intensiﬁed for

h ∈ {3,8}. A small value of h serves to mitigate over-

ﬁtting, potentially enhancing accuracy in certain sce-

narios. However, it is likely to be sensitive to random

noise, thereby resulting in divergent results. The ﬁnd-

ings align with the cluster analysis presented in the re-

cently published work (Uher and Kr

omer, 2023), indi-

cating optimal performance with LHSO and Uniform

samplings for histograms with h = 50 (other h is not

given). On the contrary, earlier publications (Renau

et al., 2021; Kr

omer et al., 2022) assert the positive

impact of the Sobol sampling strategy on feature val-

ues resulting in superior classiﬁcation accuracy, but

the presented differences are not substantial. The his-

togram is a very straightforward representation of FL

sensitive to the used sampling, while the robust fea-

ture sets (e.g. FLACCO) examine the local structure

of FLs overcoming the random noise at higher com-

putational complexity.

Next, the performance of different classiﬁcation

algorithms is compared for the ﬁxed n, d, and h. The

Olympic medal ranking depending on the h is shown

in table 3. The results indicate that the kNN gener-

ally beats the DT and RF classiﬁers. This is proba-

bly due to the characteristics of histograms as their

bins cannot be simply interpreted as vector coordi-

nates (ranges of values vary). The measures used with

the kNN algorithm better reﬂect the discrete probabil-

ity distribution of ﬁtness values. The histogram dis-

tance and KL-divergence perform better for lower h

and the Euclidean distance prevails for higher h.

Fitness Histograms of Expert-Deﬁned Problem Classes in Fitness Landscape Classiﬁcation

209

Table 2: Olympic medal ranking of sampling strategies.

h = 3 h = 8 h = 15 h = 25 h = 50

1st 2nd 3rd 4th 5th 1st 2nd 3rd 4th 5th 1st 2nd 3rd 4th 5th 1st 2nd 3rd 4th 5th 1st 2nd 3rd 4th 5th

Uniform 3 10 7 11 14 5 4 10 11 15 5 8 9 2 21 6 9 12 13 5 7 11 17 5 5

Sobol 1 2 5 14 23 4 4 3 18 16 7 10 10 14 4 10 5 6 11 13 5 10 4 12 14

G-Halton 7 5 8 18 7 2 9 13 11 10 8 6 11 10 10 5 5 10 11 14 6 5 8 14 12

LHS 7 22 15 1 0 21 11 8 4 1 12 7 7 15 4 11 12 10 4 8 13 11 10 5 6

LHSO 28 12 5 0 0 15 23 6 1 0 19 10 10 3 3 19 14 4 6 2 20 13 4 4 4

Table 3: Olympic medal ranking of classifiers.

h = 3 h = 8 h = 15 h = 25 h = 50

1st 2nd 3rd 4th 5th 1st 2nd 3rd 4th 5th 1st 2nd 3rd 4th 5th 1st 2nd 3rd 4th 5th 1st 2nd 3rd 4th 5th

DT 3 1 3 5 33 0 0 0 0 45 0 0 0 0 45 0 0 0 0 45 0 0 0 8 37

RF 10 6 5 24 0 10 6 6 23 0 11 5 4 25 0 4 4 7 30 0 1 5 17 22 0

kNN(Eucl.) 11 17 11 5 1 14 9 17 5 0 16 16 10 3 0 22 14 6 3 0 22 12 8 3 0

kNN(Hist.) 17 13 11 3 1 10 15 13 7 0 10 17 15 3 0 19 19 6 1 0 20 20 4 1 0

kNN(KL-div.) 16 5 15 6 3 17 15 8 5 0 15 8 13 9 0 10 8 19 8 0 6 11 10 10 8

Although high accuracies are scattered across all

methods and conﬁgurations, several strong trends can

be picked up. Overall, the best average results are ob-

tained for h = 8, n = 2

, LHS and LHSO sampling

strategies, and kNN classiﬁer. These trends are also

underlined by the absolute best accuracies and corre-

sponding conﬁgurations mentioned before.

Table 4: LOPO: average accuracy of problem classification

for h = 8, and kNN (KL-divergence, k = 7)

d = 5 d = 10 d = 20

n: 2

Uniform 0.336 0.314 0.256 0.386 0.408 0.417 0.289 0.272 0.275

Sobol 0.275 0.311 0.281 0.331 0.350 0.372 0.322 0.294 0.297

G-Halton 0.353 0.292 0.358 0.383 0.417 0.372 0.325 0.250 0.281

LHS 0.319 0.267 0.267 0.372 0.406 0.417 0.336 0.294 0.250

LHSO 0.281 0.306 0.264 0.358 0.406 0.417 0.339 0.253 0.250

3.2 Leave-One-Problem-Out Scenario

In this section, a deeper investigation of the aver-

age best conﬁgurations is conducted in the context

of the expert-deﬁned 5-class classiﬁcation of 24 BBOB

COCO test functions. Primarily, the compactness of

normalized histograms of functions within the same

class is examined with a special focus on the impact

of various sampling strategies. The previous section

showed that the performance is very high for the sim-

ple scenario when the model includes sample sets of

all test functions. Alternatively, a leave-one-problem-

out (LOPO) cross-validation approach is employed.

Given the division of the 24 functions into 5 groups,

one function is consistently excluded from the train-

ing set and exclusively utilized as the singular func-

tion in the test set. Consequently, the classiﬁer must

learn from the remaining functions, assimilating all

available information to appropriately assign the test

function to its right class. The classiﬁcation accuracy

is determined as the average across all 24 folds of the

LOPO cross-validation.

The results are shown in table 4 and the experi-

ments were done for h = 8 and kNN (KL-divergence)

which seems to be the best-performing combination.

The table indicates that the accuracies vary between

25% to 41.7%, and thus, they are not as convinc-

ing as in the case of general classiﬁcation. The best

results are achieved for different samplings and set

sizes. The classiﬁer is best-performing in dimension

d = 10 for n = 2

while the results are relatively bet-

ter for n ∈ {2

} in d ∈ {5, 20}. The best accu-

racy of 41.7% was achieved in d = 10 for all sampling

strategies except to Sobol sampling.

Figure 1 illustrates the similarities between func-

tions, as it displays normalized histograms of all 24

BBOB test functions for LHS sampling, d = 10, h = 8,

and n = 2

. The ﬁve expert-deﬁned groups are in

order: 1) Separable functions (f001-f005), 2) Func-

tions with low or moderate conditioning (f006-f009),

3) Functions with high conditioning and unimodal

(f010-f014), 4) Multi-modal functions with adequate

global structure (f015-f019), 5) Multi-modal func-

tions with weak global structure (f020-f024). The

comparison of histograms indicates that the distribu-

tions of ﬁtness values can be similar across differ-

ent expert-deﬁned groups. Although the ﬁtness his-

tograms can accurately distinguish single functions,

they cannot represent the properties of the expert-

deﬁned classes very well which explains the low ac-

curacies of LOPO classiﬁcation.

For the same conﬁguration, the t-SNE clustering

is provided in ﬁgure 2 where the ﬁtness histograms of

functions assigned to 5 classes are visualized based on

5 sampling strategies. The points with black edges are

correctly classiﬁed using the LOPO scenario. The ﬁg-

ures show that the clusters representing the separate

functions belonging to the same group are generally

spread over the space. Some separate compact clus-

ECTA 2024 - 16th International Conference on Evolutionary Computation Theory and Applications

210

(a) f001 (b) f002 (c) f003 (d) f004 (e) f005 (f) f006 (g) f007 (h) f008 (i) f009 (j) f010 (k) f011 (l) f012

(m) f013 (n) f014 (o) f015 (p) f016 (q) f017 (r) f018 (s) f019 (t) f020 (u) f021 (v) f022 (w) f023 (x) f024

Figure 1: Histograms of fitness values of 24 COCO functions with maximum height set to 0.8 (LHS, d = 10, h = 8, n = 2

(a) Uniform, acc. 0.417 (b) Sobol, acc. 0.372 (c) G-Halton, acc. 0.372

(d) LHS, acc. 0.417 (e) LHSO, acc. 0.417

Figure 2: t-SNE visualization (perplexity of 3) of normalized fitness histograms computed from samples generated by 5

different samplings using the kNN (KL-div., k = 7) classifier and also KL-divergence as a measure for t-SNE visualization

(d = 10, h = 8, n = 2

). The colors represent the 5 COCO classes of test functions. Points with black edges are classified

correctly, the others are misclassified. The accuracies are averaged over all 24 LOPO folds.

ters of functions are well-classiﬁed but others are too

far or even overlapping with other classes. This cor-

responds to the ﬁgure of normalized histograms that

cannot distinguish the functions properly. The visual-

izations are similar for all 5 samplings.

To even better investigate the 5-class expert-

deﬁned LOPO classiﬁcation, confusion matrices are

provided in ﬁgure 3 for all samplings. The matrices

reveal that there is speciﬁcally a problem with the ﬁrst

class that is almost never classiﬁed correctly. Another

issue is the second class, especially for Sobol and G-

Halton sampling strategies. This conﬁrms the ﬁnd-

ings from the t-SNE visualization in ﬁgure 2 where

the ﬁrst (purple) class is represented by 5 distinct clus-

ters, while the second (blue) class and also the ﬁfth

(yellow) class are sometimes overlapping with others.

4 CONCLUSIONS

In contrast with ELA features including a complex

methodology to estimate the FL properties, we pro-

pose to use a normalized histogram of ﬁtness values

as a simple scale, rotation, and translation invariant

global feature vector.

The main contribution of this paper is a compre-

hensive experimental study of histogram characteris-

tics and its ability to describe the test functions in

the context of the multi-class expert-deﬁned classi-

ﬁcation. The initial step in ELA is to generate a

set of random samples properly covering the search

space of the problem (or FL). The selection of random

samples is strongly inﬂuenced by different sampling

strategies, and therefore, their impact was thoroughly

Fitness Histograms of Expert-Deﬁned Problem Classes in Fitness Landscape Classiﬁcation

211

(a) Uniform (b) Sobol (c) G-Halton (d) LHS (e) LHSO

Figure 3: Aggregate confusion matrices using the kNN (KL-div., k = 7) with classification results within all 5 COCO groups

for 5 investigated sampling strategies in one fold of the LOPO cross-validation (d = 10, h = 8, n = 2

investigated in this paper. The benchmark suite was

based on the 24 BBOB single-objective problems from

the COCO library. The COCO expert-deﬁned groups,

splitting the BBOB problems into 5 classes according

to the properties of the underlying continuous func-

tions. The classiﬁcation performance measured by

average accuracy reached using the normalized ﬁt-

ness histograms was tested with Decision tree, Ran-

dom forest, and k-Nearest neighbors (k = 7) for Eu-

clidean, histogram distance, and KL-divergence. The

experiments were conducted for 5 sampling strategies

(Uniform, Sobol, G-Halton, LHS, LHSO), 3 sample

sizes n, 3 dimensions d, and 5 numbers of histogram

bins h.

First, the simple classiﬁcation model was consid-

ered generating 30 sample sets for each test func-

tion, 15 for training, and 15 for test. It means that

all functions were contained in the training set for

the expert-deﬁned 5-class classiﬁcation. The results

showed very high accuracies. The best ones were over

97%, all achieved for the highest sample size (n =

). Generally, better classiﬁcation performance was

achieved for h ∈ {8,15}, suggesting that other conﬁg-

urations (h ∈ {3, 25, 50}) possibly resulted in signiﬁ-

cant generalization or overﬁtting. Clearly, the high-

est performance was reached for LHS and LHSO

sampling strategies followed by average results of

Uniform sampling. In contrast to the recommenda-

tions found in published ELA literature, Sobol and

G-Halton low-discrepancy sequences generally pro-

duced lower accuracies when employed for ﬁtness

histogram computation. This discrepancy may stem

from their emphasis on achieving maximal evenness

in space sampling, potentially leading to discernible

patterns, bias, and heightened sensitivity to noise. In

terms of the used classiﬁer, the kNN clearly beats DT

and RF. When using the kNN, the histogram distance

and KL-divergence perform better for lower h and the

Euclidean distance prevails for higher h. However,

the difference is not substantial.

Next, the best conﬁguration (i.e. h = 8, kNN with

KL-divergence) was selected for further investigation

of classiﬁcation results. The leave-one-problem-out

scenario was performed excluding one problem from

the training set to keep it as the only test problem. The

average accuracies 25-41.7% indicate that the his-

tograms of functions within the same expert-deﬁned

group differ too much. This trend was thoroughly

examined in the experiments with application of his-

tograms visualization, t-SNE clustering visualization

and confusion matrices. This means that the factors

considered by experts to establish the problem classes

cannot be simply represented by ﬁtness histograms.

Although the simple ﬁtness histograms do not per-

form well in the LOPO classiﬁcation of one spe-

ciﬁc expert-deﬁned grouping, they are outstanding for

standard problem classiﬁcation. In that context, paper

revealed strong impact of the used sampling strategy,

number of histogram bins, and classiﬁer.

The future work will aim at more benchmarking

of ELA features, and multi-objective problems.

ACKNOWLEDGEMENTS

This work was supported by the Czech Science Foun-

dation in the project “Constrained Multiobjective Op-

timization Based on Problem Landscape Analysis”,

grant no. GF22-34873K, and the Student Grant Sys-

tem, VSB – Technical University of Ostrava, grant no.

SP2024/006.

REFERENCES

Adair, J., Ochoa, G., and Malan, K. M. (2019). Local op-

tima networks for continuous ﬁtness landscapes. In

Proceedings of the Genetic and Evolutionary Com-

putation Conference Companion, GECCO ’19, page

1407–1414, New York, NY, USA. Association for

Computing Machinery.

Barla, A., Odone, F., and Verri, A. (2003). Histogram in-

tersection kernel for image classiﬁcation. In Proceed-

ings 2003 international conference on image process-

ing (Cat. No. 03CH37429), volume 3, pages III–513.

IEEE.

ECTA 2024 - 16th International Conference on Evolutionary Computation Theory and Applications

212

Cha, S.-H. and Srihari, S. N. (2002). On measuring the

distance between histograms. Pattern Recognition,

35(6):1355–1370.

Chi, H., Mascagni, M., and Warnock, T. (2005). On the op-

timal Halton sequence. Mathematics and Computers

in Simulation, 70(1):9–21.

Das, S. and Suganthan, P. N. (2010). Differential evolution:

A survey of the state-of-the-art. IEEE Transactions on

Evolutionary Computation, 15(1):4–31.

Eslami, M., Shareef, H., Khajehzadeh, M., and Mohamed,

A. (2012). A survey of the state of the art in particle

swarm optimization. Research Journal of Applied Sci-

ences, Engineering and Technology, 4(9):1181–1197.

Halton, J. H. (1964). Algorithm 247: Radical-inverse quasi-

random point sequence. Communications of the ACM,

7(12):701–702.

Hansen, N., Auger, A., Ros, R., Mersmann, O., Tu

sar, T.,

and Brockhoff, D. (2021). COCO: A platform for

comparing continuous optimizers in a black-box set-

ting. Optimization Methods and Software, 36(1):114–

144.

Katoch, S., Chauhan, S. S., and Kumar, V. (2021). A review

on genetic algorithm: Past, present, and future. Multi-

media Tools and Applications, 80(5):8091–8126.

Kerschke, P. and Trautmann, H. (2019). Automated al-

gorithm selection on continuous black-box problems

by combining exploratory landscape analysis and ma-

chine learning. Evolutionary Computation, 27(1):99–

127.

omer, P., Uher, V., Andova, A., Tusar, T., and Filipic,

B. (2022). Sampling strategies for exploratory land-

scape analysis of bi-objective problems. In 2022 Inter-

national Conference on Computational Science and

Computational Intelligence (CSCI), pages 336–342,

Los Alamitos, CA, USA. IEEE Computer Society.

omer, P., Uher, V., Tu

sar, T., and Filipi

c, B. (2024). On

the latent structure of the bbob-biobj test suite. In Ap-

plications of Evolutionary Computation, pages 326–

341, Cham. Springer Nature Switzerland.

Kullback, S. and Leibler, R. A. (1951). On Information and

Sufﬁciency. The Annals of Mathematical Statistics,

22(1):79 – 86.

Lang, R. D. and Engelbrecht, A. P. (2021). An exploratory

landscape analysis-based benchmark suite. Algo-

rithms, 14(3).

Liefooghe, A., Verel, S., Chugh, T., Fieldsend, J., All-

mendinger, R., and Miettinen, K. (2023). Feature-

based benchmarking of distance-based multi/many-

objective optimisation problems: A machine learn-

ing perspective. In Evolutionary Multi-Criterion Op-

timization, pages 260–273, Cham. Springer Nature

Switzerland.

Malan, K. M. (2021). A Survey of Advances in Landscape

Analysis for Optimisation. Algorithms, 14(2):40.

McKay, M. D., Beckman, R. J., and Conover, W. J. (2000).

A comparison of three methods for selecting values of

input variables in the analysis of output from a com-

puter code. Technometrics, 42(1):55–61.

Mersmann, O., Bischl, B., Trautmann, H., Preuss, M.,

Weihs, C., and Rudolph, G. (2011). Exploratory land-

scape analysis. In Proceedings of the 13th Annual

Genetic and Evolutionary Computation Conference

(GECCO), pages 829–836. ACM.

noz, M. A., Kirley, M., and Halgamuge, S. K. (2015).

Exploratory landscape analysis of continuous space

optimization problems using information content.

IEEE Transactions on Evolutionary Computation,

19(1):74–87.

Nowakova, J. and Pokorny, M. (2014). System identiﬁca-

tion using genetic algorithms. In Proceedings of the

Fifth International Conference on Innovations in Bio-

inspired Computing and Applications (IBICA 2014),

volume 303 of Advances in Intelligent Systems and

Computing, pages 413–418.

Pikalov, M. and Mironovich, V. (2021). Automated pa-

rameter choice with exploratory landscape analysis

and machine learning. In Proceedings of the Genetic

and Evolutionary Computation Conference (GECCO)

Companion, pages 1982–1985. ACM.

Renau, Q., Doerr, C., Dreo, J., and Doerr, B. (2020). Ex-

ploratory landscape analysis is strongly sensitives to

the sampling strategy. In Parallel Problem Solving

from Nature – PPSN XVI, volume 12270, pages 139–

153. Springer.

Renau, Q., Dreo, J., Doerr, C., and Doerr, B. (2021).

Towards explainable exploratory landscape analysis:

Extreme feature selection for classifying BBOB func-

tions. In Applications of Evolutionary Computation,

pages 17–33. Springer.

Richter, H. and Engelbrecht, A. (2014). Recent advances

in the theory and application of ﬁtness landscapes.

Springer.

Shirakawa, S. and Nagao, T. (2016). Bag of local landscape

features for ﬁtness landscape analysis. Soft Comput-

ing, 20(10):3787–3802.

Sobol, I. M. (1967). On the distribution of points in a cube

and the approximate evaluation of integrals. Zhurnal

Vychislitel’noi Matematiki i Matematicheskoi Fiziki,

7(4):784–802.

Swain, M. J. and Ballard, D. H. (1991). Color indexing.

International journal of computer vision, 7(1):11–32.

Tanabe, R. (2022). Benchmarking feature-based algorithm

selection systems for black-box numerical optimiza-

tion. IEEE Transactions on Evolutionary Computa-

tion, pages 1321–1335.

Trajanov, R., Dimeski, S., Popovski, M., Koro

sec, P., and

Eftimov, T. (2022). Explainable landscape analysis in

automated algorithm performance prediction. In Ap-

plications of Evolutionary Computation, pages 207–

222. Springer.

Uher, V. and Kr

omer, P. (2023). Impact of different dis-

crete sampling strategies on ﬁtness landscape analy-

sis based on histograms. In Proceedings of the 13th

International Conference on Advances in Information

Technology, pages 1–9.

Zou, F., Chen, D., Liu, H., Cao, S., Ji, X., and Zhang, Y.

(2022). A survey of ﬁtness landscape analysis for op-

timization. Neurocomputing, 503:129–139.

Fitness Histograms of Expert-Deﬁned Problem Classes in Fitness Landscape Classiﬁcation

213