Cartesian Genetic Programming Is Robust Against Redundant

Attributes in Datasets

Henning Cui

and J

org H

ahner

University of Augsburg, 86159 Augsburg, Germany

{henning.cui, joerg.haehner}@uni-a.de

Keywords:

Cartesian Genetic Programming, CGP, Noisy Attributes, Duplicate Attributes.

Abstract:

Real world datasets might contain duplicate or redundant attributes—or even pure noise—which may not be

ﬁltered out by data preprocessing algorithms. This might be problematic, as it decreases the performance

of learning algorithms. Cartesian Genetic Programming (CGP) is able to choose its own input attributes by

design. Thus, we hypothesize that CGP should be able to ignore redundant or noise attributes. In this work,

we empirically show that CGP is indeed able to handle such problematic datasets. For this task, six different

datasets are extended with different kinds of redundancies: Duplicated-, duplicated and noised-, and pure

noise attributes. Different numbers of unwanted attributes are examined, and we present our results which

indicate that CGP is robust against additional redundant or noisy attributes in a dataset. We show that there is

no decrease in performance as well as no change in CGP’s convergence behaviour.

1 INTRODUCTION

Any kind of imperfection in a dataset might decrease

the ﬁnal performance of a learning algorithm. Such

ﬂaws might occur in real world datasets, as they could

contain inconsistencies, redundant-, noisy-, or dupli-

cate attributes. Preprocessing or data mining algo-

rithms try to improve the quality of a given dataset by

feature- or instance selection techniques, for example.

These algorithms reduce the dimensionality of data by

removing redundant or conﬂicting attributes respec-

tively (Garc

ıa et al., 2015). However, most algorithms

assume independent and identically distributed data.

If this precondition is not given, unneeded attributes

might not be ﬁltered out (Rong et al., 2019). This can

slow down the training time needed of machine learn-

ing algorithms (Hall and Smith, 1997) or decrease

their accuracy (Duangsoithong and Windeatt, 2009).

The other way around, redundant features might not

even impede machine learning algorithms. Duang-

soithong and Windeatt found that removing redundant

features can decrease the accuracy of ensemble learn-

ing methods (Duangsoithong and Windeatt, 2009).

Thus, choosing algorithms that remove every instance

of redundancy is not always the best choice.

We believe that Cartesian Genetic Programming

https://orcid.org/0000-0001-5483-5079

https://orcid.org/0000-0003-0107-264X

(CGP) should be able to ignore unwanted attributes

through its representation and evolutionary mecha-

nisms. CGP consists of nodes in a grid which are

partially connected. By being able to evolve its con-

nections, it might learn to not connect to unwanted

attributes—which means that those inputs are ig-

nored. As a result, CGP might not be negatively af-

fected by duplicated or noisy attributes. This means

that CGP might be a great choice to consider for

datasets which could not be preprocessed perfectly.

Motivated by this hypothesis, we investigate the

effects of additional duplicated-, duplicated and

noised-, and pure noise attributes in datasets on CGP.

For this reason, six UCI (Kelly et al., ) datasets

are used and extended with different levels of artiﬁ-

cial and unwanted attributes. We examine its effects

on CGP’s performance and behaviour by empirical

means and try to give an answer to our hypothesis.

Based on these goals, we provide a quick

overview of related work in the following Section 2.

Section 3 then reintroduces CGP. We also discuss

our hypothesis more in-depth. Afterwards, Section 4

presents the experimental design of this work. This

is followed by Section 5, where we report our results

and discuss our research questions as well as our hy-

pothesis. At last, Section 6 summarizes our ﬁndings

and discusses future research directions.

108

Cui, H. and Hähner, J.

Cartesian Genetic Programming Is Robust Against Redundant Attributes in Datasets.

DOI: 10.5220/0012974600003837

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 16th International Joint Conference on Computational Intelligence (IJCCI 2024), pages 108-119

ISBN: 978-989-758-721-4; ISSN: 2184-3236

2 RELATED WORK

Various previous works investigated the effects of re-

dundant data on algorithms. However, to the best of

our knowledge, we are the ﬁrst to investigate its inﬂu-

ence on CGP. Nevertheless, various other articles laid

out the foundation for this work.

The investigation of data preprocessing mecha-

nisms is a major research subject in the ﬁeld of data

mining. There are numerous algorithms for different

kinds of preprocessing tasks (Garc

ıa et al., 2015).

Feature selection is another important topic in the

realm of data mining, as the goal of these algorithms

is to reduce the dimensionality of data. This can be

achieved, among other things, by using genetic algo-

rithms (Tiwari and Singh, 2010; Xu et al., 2009). It is

also possible to use fuzzy genetic algorithms, as was

done by Fung et al. (Fung et al., 1997). Other pos-

sibilities include the application of differential evolu-

tion algorithms, as was done by Bidgoli et al. (Bidgoli

et al., 2019).

Instance selection is another technique that is used

in combination with feature selection. Here, the goal

is to remove faulty data. Again, genetic algorithms

can be considered. Tsai et al. used genetic algorithms

for both feature and instance selection (Tsai et al.,

2013). They also examined the effects of perform-

ing only instance-, or only feature selection, as well

as performing both. Both feature- and instance selec-

tion can also be performed simultaneously by using

genetic algorithms (Albuquerque et al., 2020).

3 CARTESIAN GENETIC

PROGRAMMING

Cartesian Genetic Programming is a supervised

learning algorithm invented in 1999 by Miller (Miller,

1999). In this section, we reintroduce CGP’s repre-

sentation, its standard evolutionary operators, and ex-

plain our hypothesis.

3.1 Representation

The standard CGP version we are using in this work

is represented by a directed, acyclic and feed-forward

graph. It is a grid which consists of partially con-

nected nodes. Originally, it was conceptualized with

a c×r grid with c ∈ N

and r ∈ N

. However, today’s

standard consists of a CGP model with only one row

for most applications (Miller, 2011). Furthermore,

CGP’s representation allows for an arbitrary amount

of program inputs and outputs.

INPUT

ADD

MUL

SUB

OUTPUT

Figure 1: Example graph deﬁned by a CGP genotype. The

dashed node and connections are inactive due to not con-

tributing to the output.

These aforementioned nodes can be categorized

into input-, output-, and computational nodes. The

ﬁrst type, input nodes, directly receive the program

input to relay them to other nodes. Output nodes redi-

rect the output of an input- or computational node.

Both types—input and output nodes—do not change

their respective ingoing value. As for the last cate-

gory: Computational nodes do change their inputs.

They are represented by one function- and a connec-

tion genes, with a ∈ N

being the maximum arity

of one function in the whole function set. Function

genes encode the function of a node, while the con-

nection genes deﬁne the nodes respective input. This

is done by deﬁning a path between a previous and the

current node.

Another important distinction is the difference be-

tween active and inactive nodes—both input- and

computation nodes can be grouped into one of these

two categories. On the one hand, active nodes are

part of a path to one or multiple output nodes. Be-

cause of that, they contribute to the program’s ﬁnal

output. On the other hand, inactive nodes are not part

of a path to output nodes. Hence, they do not con-

tribute to the program’s ﬁnal output. While there are

methods to enforce all nodes to be active, the exis-

tence of inactive nodes contributes to an improvement

in CGP’s evolutionary search. This allows for neu-

tral genetic drift (Miller and Smith, 2006; Turner and

Miller, 2015), which may lead to better ﬁtness values

and/or faster convergence.

An illustrative example of a graph deﬁned by CGP

can be seen in Figure 1. It depicts the genotype with

two input-, three computational- and one output node.

Active nodes are drawn with a solid line, while inac-

tive nodes are marked by dashed lines. The ﬁrst two

nodes are input nodes, which correspond to a respec-

tive input attribute. They are followed by three com-

putational nodes, and one output node at the end. In

this example, only the ﬁrst input is used to calculate

an output. The ﬁrst attribute is taken and added to it-

self at node n

. Afterwards, this result is taken and

multiplied by the ﬁrst attribute—with its outcome be-

ing the result of this program. Input node n

and com-

putational node n

are not part of a path to an output

node. As a result, they do not contribute to the pro-

grams ﬁnal output and are classiﬁed as inactive.

To simplify the description of a CGP conﬁgura-

Cartesian Genetic Programming Is Robust Against Redundant Attributes in Datasets

109

tion in the following work: When we mention a graph

deﬁned by CGP with n ∈ N

nodes, this graph will

have only one row and n computational nodes. Fur-

thermore, it also contains additional input- and output

nodes corresponding to the given learning task.

3.2 Evolutionary Algorithms

In this work, we use an elitist (µ + λ) evolution strat-

egy (ES) with µ = 1 and λ = 4, as is standard in

most CGP variants (Miller, 2020). In addition, neu-

tral search is included into the (1 + 4)-ES to improve

CGP’s convergence time and ﬁtness value (Yu and

Miller, 2001; Turner and Miller, 2015). That means:

When an offspring has the same or better ﬁtness value

than the parent, this offspring is always chosen as the

next parent. This leads to neutral drift, which enables

a better exploration of different genotypes (Miller,

2020).

As for the mutation operator, we use one proposed

by Goldman and Punch called Single (Goldman and

Punch, 2013). It works by selecting and mutating ran-

dom nodes until one active node is mutated. This has

the beneﬁt that a change in CGP’s phenotype is en-

forced. When a standard point or probabilistic muta-

tion strategy is used, it is possible that only inactive

genes are mutated (Goldman and Punch, 2013; Gold-

man and Punch, 2015). As a consequence, the qual-

ity of the newly mutated individual cannot be eval-

uated. This might lead to more training iterations

needed as well as being stuck at local optima. By en-

forcing a change in CGP’s phenotype with Single, no

wasted evaluations are performed. It also has the ben-

eﬁt that it does not rely on a mutation rate (Goldman

and Punch, 2013).

CGP does not proﬁt from standard crossover oper-

ators (Miller, 2011; Cai et al., 2006; Kalkreuth et al.,

2017). This is why we also do not include it in this

work.

3.3 Ignoring Redundant Attributes

As already mentioned in Section 1, real world datasets

might contain duplicate attributes (Hern

andez and

Stolfo, 1998) or unimportant ones (Kumar and

Chaurasiya, 2019). This can negatively affect learn-

ing algorithms.

We believe that CGP should be able to handle

some amount of unnecessary attributes in a dataset.

As already mentioned in Section 3.1, the nodes used

in CGP are able to mutate their ingoing connection

genes. Therefore, the differentiation between ac-

tive and inactive nodes are important—because some

input- or computational nodes are not part of a path

to any output nodes (see Figure 1). Because of that,

an inactive input node means that its corresponding

attribute is not used to generate an output. This is

why we believe that CGP should handle redundant at-

tributes well. Some nodes may obtain their inputs by

being connected to unwanted attributes. Via CGP’s

evolutionary mechanisms, a node should be able to

mutate such connections to use more meaningful in-

puts. Over time, input nodes corresponding to these

unwanted attributes should become inactive. Thus,

they do not contribute to the program’s output—and

do not affect its ﬁnal ﬁtness value.

4 EXPERIMENTAL DESIGN

In this section, our whole experimental setup is de-

scribed. We present the datasets used as well as meth-

ods to add redundancies into them. Afterwards, a

brief introduction into Bayesian data analysis and a

description of our hyperparameter study is given.

4.1 Problem Sets

As we try to answer our hypothesis empirically, the

choice of the right datasets is important. Six classi-

ﬁcation datasets downloaded from the UCI Machine

Learning Repository (Kelly et al., ) were chosen ac-

cording to the recommendations from the genetic pro-

gramming community (White et al., 2013). We in-

clude: Abalone, Credit Approval (Credit), Statlog

Shuttle (Shuttle), Breast Cancer Wisconsin Diagnosis

(Cancer), Page Blocks Classiﬁcation (Page Blocks),

and Waveform Version 1 (Waveform) (Kelly et al., ).

They were chosen to cover different number of in-

stances, number of attributes, and number of classes

to predict. These speciﬁc values, among others, are

shown in Table 1.

Concerning the pre-processing of the datasets, we

standardized each one. In addition, entries with miss-

ing values were removed—as was the case for Credit,

for example.

In order to answer our hypothesis, we must gauge

CGP’s ability to deal with redundant data. There-

fore, redundant data is added incrementally to ob-

serve CGP’s performance differences. The number

of additional data is added with respect to the datasets

number of attributes. This means that we increase the

dataset’s size by a ﬁxed, predeﬁned percentage: 20 %,

40 %, · ··, 100 %. These values were chosen in or-

der to gain signiﬁcant insight into CGP’s behaviour

without cluttering our results (e.g. using a percentual

step size of 10 %) or having too unrealistic values

(e.g. more than 100 % redundancies). For example:

ECTA 2024 - 16th International Conference on Evolutionary Computation Theory and Applications

110

Given a dataset with 10 attributes and an increase in

its size by 40 %. That means, 4 additional redundant

attributes are added, increasing the datasets total num-

ber of attributes to 14.

Please note that we do not include symbolic re-

gression benchmarks such as Korns-12, which is also

one of the recommended benchmarks to use for eval-

uation purposes (White et al., 2013). Its peculiarity

is that ﬁve input variables are deﬁned but only two

variables are used to generate an output. The goal of

Korns-12 is to test if an algorithm is able to ignore

unimportant variables. While it ﬁts our scenario, we

believe that using it would distort our results. There

is also no proposed method to remove or add unim-

portant variables. As we cannot remove variables, it

is not possible to create a baseline without any un-

used variables. In addition, as additional unimpor-

tant variables cannot be added, different magnitudes

of unimportant variables can also not be examined.

This would strongly limit our evaluation, as we could

not compare it to anything.

4.2 Adding Redundancies into Datasets

In this work, three different ways of adding redundan-

cies are examined: Duplicating attributes, duplicat-

ing attributes and noising them, and adding pure noise

drawn from a Gaussian distribution. Please note: In

order to avoid repetitions, phrases like unwanted at-

tributes, redundancies, etc. are used synonymously.

4.2.1 Duplicate Attributes

The ﬁrst method randomly copies attributes and in-

serts them into random positions without changing

them. This operation leads to attributes that should

be easily detected and removed without repercussion

during the data pre-processing phase of training a

model. Thus, this method should be viewed as a sec-

ond baseline—next to CGP trained without any added

redundancies—to evaluate CGP’s ability to handle at-

tribute redundancies.

To give a more formal expression of copying ran-

dom attributes and inserting them into random posi-

tions: Let D ∈ D

n×a

= (d

i j

)

i=1,···,n

j=1,··· ,m

be a dataset con-

taining n entries and a attributes. Furthermore, D is

a set of numbers or a set of categorical values. Addi-

tionally, we introduce a parameter r ∈ R

which de-

ﬁnes the percentage of additional attributes added to

bloat the dataset. This means, we increase the size of

a dataset D by s

= ⌈a · r⌉.

Expanding D works by drawing random in-

dices u

, ··· , u

at ﬁrst, with u

∈

{

1, 2, ··· , a

}

for

k = 1, ··· , s. These indices u

, ··· , u

deﬁne which

Algorithm 1: First redundancy method: Duplicate and

insert random attributes.

Data: Dataset D, percentage of additional

attributes r ∈ R

t ← 0;

s ← ⌈|a · r⌉;

′

← Clone(D);

U ←

{

, ··· , u

}

random indices from

{

1, 2, ··· , a

}

;

foreach u ∈ U do

′

← D[: , u];

v ← randomly drawn number from

{

1, 2, ··· , a +t

}

;

expand D

′

by shifting all elements after v

one dimension to the right and inserting

′

into D[: , v];

t ← t + 1;

end

return D

′

attribute columns in D will be copied. In order to ﬁ-

nally expand our dataset, we must ﬁrst create a copy

of D called D

′

, which we will expand upon and add re-

dundancies. Then, for each index u

with k = 1, ··· , s,

we copy a set d

′



1,u

, ··· , d

n,u



∈ D

. At last,

we draw a random index v ∈

{

0, ··· , a

′

}

and insert d

′

into the vth column of D

′

, with a

′

being D

′

’s current

number of attributes. That means, we copy the u

attribute column in D, shift all elements after a ran-

dom column position v to the right, and insert it into

position v in D

′

. To further clarify our approach, we

include its pseudocode in Algorithm 1.

4.2.2 Duplicate Attributes and Add Noise

Our second method works by duplicating attributes

and adding noise before inserting them into the

dataset. This method is a more realistic version of re-

dundant attributes in a dataset. Sensor readings might

drift and/or ﬂuctuate. Due to this reason, for exam-

ple by placing two sensors close to each other, their

readings should not lead to the exact same value.

To perform this second method, similar steps com-

pared to Algorithm 1 have to be performed. We only

differ at the last steps: For each index u

with k =

1, ··· , s, we copy a set d

′



1,u

, ··· , d

n,u



∈ D

However, before we insert d

′

into the new dataset

′

, it must be noised. In this work, noising each at-

tribute means that it is value changes by increasing or

decreasing it by up to ten percent. Hence, for each

value in d

′

, we draw a uniformly distributed value

∼ U

[−0.1,0.1]

for i =

{

0, ··· , n

}

. Afterwards, noise

Cartesian Genetic Programming Is Robust Against Redundant Attributes in Datasets

111

Table 1: The full name of datasets used in this work, the dataset’s size (Size), number of classes to predict (# Classes), number

of attributes (# Attrib.), and its number of additional attributes given a speciﬁc percentage of redundancy (x %).

Dataset Size # Classes # Attrib. 20 % 40 % 60 % 80 % 100 %

Abalone 4,177 28 8 +2 +4 +5 +7 +8

Breast Cancer Wisconsin Diag. 569 2 30 +6 +12 +18 +24 +30

Credit Approval 690 2 15 +3 +6 +9 +12 +15

Page Blocks Classiﬁcation 5,473 5 10 +2 +4 +6 +8 +10

Statlog (Shuttle) 58,000 7 9 +2 +4 +6 +8 +9

Waveform Version 1 5,000 3 21 +5 +9 +13 +17 +21

Algorithm 2: Second redundancy method: Duplicate

attribute and add noise.

Data: Dataset D, percentage of additional

attributes r ∈ R

t ← 0;

s ← ⌈|a · r⌉;

′

← Clone(D);

U ←

{

, ··· , u

}

random indices from

{

1, 2, ··· , a

}

;

foreach u ∈ U do

′

← D[: , u];

v ← randomly drawn number from

{

1, 2, ··· , a +t

}

;

foreach d

′

∈ d

′

x ∼ U

[−0.1,0.1]

;

′

← d

′

+ d

′

· x;

end

expand D

′

by shifting all elements after v

one dimension to the right and inserting

′

into D[: , v];

t ← t + 1;

end

return D

′

is added for each d

′

i,u

′

i,u

← d

′

i,u

+ d

′

i,u

· x

This noised attribute set is then inserted into D

′

into

a random attribute index v after all elements after v

are shifted to the right. Again, to further clarify our

approach, we refer to Algorithm 2.

4.2.3 Add Pure Noise

For our last method to evaluate our hypothesis, we

only add Gaussian distributed noise as redundant at-

tributes. As already mentioned in Section 4.1, we

standardize our data. That means, each dataset has

a mean of zero and a standard deviation of one. Thus,

we are able to draw from a Gaussian distribution with

a mean of zero and standard deviation of one. As a

result, we generate truly redundant attributes—which

Algorithm 3: Third redundancy method: Insert random

noise-attributes.

Data: Dataset D, number of D’s attributes a,

percentage of additional attributes

r ∈ R

t ← 0;

s ← ⌈a · r⌉;

′

← Clone(D);

repeat s times

v ← randomly drawn number from

{

1, 2, ··· , a +t

}

;

p =

{

, ··· , p

}

with p

∼ N (0, 1) and

i =

{

1, ··· , n

}

;

expand D

′

by shifting all elements after v

to the right and inserting p into D[:, v];

end

return D

′

would be equivalent of using faulty or wrongly con-

ﬁgured sensors, for instance.

Adding redundant attributes needs similar steps to

Algorithm 1. Again, we must deﬁne similar param-

eters and sets: Our dataset D, parameter r to deﬁne

the percentage of a datasets increase in attributes, and

a cloned dataset D

′

. We differ from the ﬁrst two ap-

proaches as we do not rely on D to generate our re-

dundant data. Instead, we insert pure noise. For this

approach, a noise vector p is generated by drawing

from a Gaussian distribution: p =

{

, ··· , p

}

with

∼ N (0, 1) and i =

{

1, ··· , n

}

. Please note that each

value in p is drawn independently. Then, a random in-

dex v is drawn. It represents the position of D

′

, into

which our noise vector p is added. At last, p is added

into D

′

at the attribute position v. This works by shift-

ing all elements after v to the right and inserting p

into the position v. Again, Algorithm 3 depicts this

process for further clariﬁcation.

4.3 Bayesian Data Analysis

In order to gauge the effects of redundant data, our

results must be ranked according to their respective

ﬁnal ﬁtness value. As this number cannot be neg-

ECTA 2024 - 16th International Conference on Evolutionary Computation Theory and Applications

112

ative, common statistical tests—such as Student’s t-

test, which uses a Student’s t-distributions—should

not be used. The reason is that such distributions

cannot be expected to model the data well (Kruschke,

2013). On that account, we perform a Bayesian data

analysis for the posterior distributions of our results.

The model to compare the algorithms is based on the

Plackett-Luce model described by Calvo et al. (Calvo

et al., 2018). It allows the computation of a set of

ranked options by estimating the probabilities of each

of the options to be the one with the highest rank. For

this task, we use the Python library cmpbayes (P

atzel,

2023) for all statistical models. As is standard prac-

tice, prior sensitivity analyses were conducted to en-

sure the robustness of all models. For more informa-

tions regarding the models, we refer to Kruschke (Kr-

uschke, 2013) and P

atzel (P

atzel, 2023).

4.4 Conﬁguration of CGP and Its

Training

In our experiments, we used a standard CGP conﬁgu-

ration. That means: No crossover, a modiﬁed (1 + 4)-

ES as described in Section 3, and Single (Goldman

and Punch, 2013) mutation. The only hyperparam-

eter that must be optimized in our setting is CGP’s

number of computational nodes n. In order to have a

fair comparison, n was optimized for each combina-

tion of: Dataset; no redundant attributes, or additional

redundant attributes with respect to one of the three

redundancy types introduced in this work and given a

speciﬁc percentage of redundancy.

We investigated n ∈

{

50, 100, ··· , 2000

}

for each

aforementioned combination. As the datasets men-

tioned in Section 4.1 do not contain a train/test split,

k-fold cross-validation with k = 5 was employed to

generate a training- and a test dataset. Each conﬁgura-

tion was tested 20 times with independent repetitions

and completely random seeds. Afterwards, to ﬁnd the

best n, we ranked them according to the Plackett-Luce

model described by Calvo et al. (Calvo et al., 2018)

with respect to their ﬁnal test ﬁtness value. Please

note: The ﬁnal hyperparameters found and used are

listed in our results.

Because all datasets can be categorized as classiﬁ-

cation tasks, we use the same ﬁtness metric during the

training of all datasets. We chose the Balanced Accu-

racy, which should be used for imbalanced datasets.

It is deﬁned by calculating the average of recall ob-

tained on all classes. The reason is that some datasets

(e.g. Shuttle) are heavily unbalanced. As a result, a

standard accuracy metric would not reﬂect CGP’s ﬁt-

ness accurately.

A single run has a budget of 100,000 iterations.

That means, a run is stopped after the given budget.

Additionally, a run is stopped preliminary when the

ﬁtness value of the training data reaches a value less

than 0.01. In this case, we classify a dataset as solved.

To generate our ﬁnal results, each conﬁguration

used the best n found. The tests were run again for

50 times, again, with independent repetitions and dif-

ferent random seeds. Furthermore, a standard 5-fold

cross-validation was used to generate the test ﬁtness

values.

5 EVALUATION

In order to ﬁnd the effects of unnecessary attributes in

datasets on CGP, we conducted an empirical study

We try to answer the following three research ques-

tions to ﬁnd a solution to our hypothesis:

Q1: How does having redundant attributes in a

dataset affect CGP? Especially regarding its

• Number of iterations until a solution is found

(I2S),

• Fitness value, and

• Number of active nodes.

Q2: Does CGP manage to ignore redundant at-

tributes?

Q3: How do unnecessary attributes affect CGP’s be-

haviour?

To increase readability, we will introduce the

following abbreviations: A CGP model trained on

a dataset without noise will be called baseline; a

CGP model trained on a dataset with duplicated at-

tributes (see Section 4.2.1) will be called CGP+DA;

a CGP model trained on a dataset with duplicated

and noised attributes (see Section 4.2.2) will be called

CGP+DA&NOISE; and ﬁnally, a CGP model trained

on a dataset that has additional Gaussian distributed

noise (see Section 4.2.3) is called CGP+NOISE.

5.1 Results of Redundant Attributes on

CGP

We show our results in the Appendix in Table 2,

Table 3 and Table 4. They show the results for

CGP+DA, CGP+DA&NOISE and CGP+NOISE re-

spectively on all datasets, as well as their baselines.

We show the percentage of additional attributes (%

Add.), the number of nodes (Nodes), number of mean

Implementation and datasets can be found at: https:

//github.com/CuiHen/redundant attributes with CGP

Cartesian Genetic Programming Is Robust Against Redundant Attributes in Datasets

113

active nodes (Active), the mean and standard devia-

tion of iterations until a dataset is solved or stopped

(I2S Mean ± Std), the mean and standard deviation of

achieved test ﬁtness (Fit Mean ± Std), the mean per-

centage of redundant attributes that are used to gener-

ate an output (% Red), and the probability of a solu-

tion being the best per dataset with respect to its test

ﬁtness (p(best)).

Regarding the I2S, all datasets except Cancer

are not classiﬁed as solved—because they all were

stopped after 100,000 iterations. Hence, only the

Cancer dataset can be solved with the given budget.

In most cases, the mean I2S is relatively equal. Thus,

regarding the effect of redundant attributes on CGP—

given the Cancer dataset—there is the trend that this

has no effect on CGP’s time to solution. However, as

this conclusion is drawn by evaluating only a single

dataset, this outcome should be treated with reserva-

tions.

Similarly, there is no clear correlation between

levels of noise and computational nodes needed. This

statement also applies to the mean number of active

nodes.

As for CGP’s ﬁtness values, a similar conclusion

can be drawn. For a given dataset, the mean ﬁt-

ness values and their standard deviations are relatively

similar. This is also reﬂected in their probabilities of

being the best solution per dataset. There is no clear

winner, given the calculated probabilities. All prob-

abilities are relatively similar, with no conﬁguration

dominating over the other. That means that adding

redundant attributes into a dataset will probably not

affect CGP’s ﬁtness value.

Please note: For better readability and understand-

ability, our three methods of adding unwanted at-

tributes are separated into three tables respectively.

However, all three redundancy methods are com-

pared/ranked against the same baseline. On all three

methods, similar results can be seen. This means:

These three types of additional noise do probably not

affect CGP’s I2S or ﬁtness value, regardless of their

percentage of additional attributes.

Another interesting fact is that CGP should be

able to ignore redundant attributes. Considering the

Waveform dataset, when attributes are duplicated, or

duplicated and noised, CGP is able to ignore most of

unwanted attributes. Only 2 % to 7 % of redundant

attributes are used. In the case of CGP+NOISE, it

will only use 8 % to 13 % of noise to calculate an out-

put, given the Shuttle dataset. As there is little to no

difference in their respective ﬁtness values, we can

conclude that CGP should be able to ignore redun-

dant attributes. However, this is not the case for all

datasets. Given the Shuttle dataset, for example, the

0 20000 40000 60000 80000 100000

Iteration

0.6

0.7

0.8

0.9

Fitness(Train)

Baseline

100%

20%

40%

60%

80%

Convergence Credit: Duplicate

0 20000 40000 60000 80000 100000

Iteration

0.6

0.7

0.8

0.9

Fitness(Train)

Baseline

100%

20%

40%

60%

80%

Convergence Credit: Duplicate+Noise

0 20000 40000 60000 80000 100000

Iteration

0.6

0.7

0.8

0.9

Fitness(Train)

Baseline

100%

20%

40%

60%

80%

Convergence Credit: Pure Noise

Figure 2: Convergence plots for all three types of data re-

dundancy on the Credit dataset.

percentage of duplicate input attributes of CGP+DA

and CGP+DA&NOISE are up to 42 %. Still, rela-

tively high ﬁtness values are achieved for this dataset.

Another more prominent example is given in Table 4,

when CGP+NOISE is considered. Given the Credit

dataset, CGP’s inputs are up to 50 % pure Gaussian

noise. Thus, depending on the given learning prob-

lem, CGP may include redundant information into

calculating its ﬁnal program output with no obvious

effect. A reason might be that the redundant informa-

tion are not meaningfully included into the calculation

of the output. Such might happen when, for exam-

ple, a noise attribute is added to another value during

an intermediate step but subtracted immediately after-

wards. This noise attribute is then listed as used but

it does not actually contribute to the programs ﬁnal

output.

ECTA 2024 - 16th International Conference on Evolutionary Computation Theory and Applications

114

5.2 Convergence Behaviour

To better understand the convergence behaviour of

CGP, convergence plots for all three types of data

redundancy were considered. We also classiﬁed

the different behaviours according to Stegherr et

al. (Stegherr. et al., 2023). On that account, we in-

vestigate the progression of the mean ﬁtness value of

the train split.

Our experiments show that each conﬁguration de-

picts the same convergence behaviour: Fast to Slow.

CGP’s ﬁtness improves drastically during the ﬁrst it-

erations. However, its progression slows down and

high numbers of iterations are needed for small per-

formance increases. Interestingly, this behaviour

can be seen for the baseline as well as CGP+DA,

CGP+DA&NOISE, and CGP+NOISE. Furthermore,

the percentage of added attributes do not affect CGP’s

convergence. This leads us to the following conclu-

sion: A bloated dataset does not affect CGP’s conver-

gence behaviour.

An illustrative example of CGP’s convergence on

all three types of redundancy is depicted in Figure 2.

We only exemplarily show the Credit dataset. The

reason is that the other datasets depict the exact same

behaviour.

6 CONCLUSION

In this work, we investigated the effects of additional

duplicated-, duplicated and noised-, and purely noise

attributes in datasets on CGP. Six different datasets

and ﬁve levels of noise were examined. They were

also compared against a baseline, which describe a

CGP model trained on a dataset without any type of

additional artiﬁcial redundancy.

Considering these three types of additional at-

tributes, we found that they do not affect CGP’s

achieved ﬁtness values in our testing. In addition,

there is also no effect on CGP’s convergence be-

haviour. We classiﬁed CGP’s behaviour according

to Stegherr et al. (Stegherr. et al., 2023) and found,

that each conﬁguration shows the same convergence

behaviour: Fast to slow. When we examine CGP’s

number of iterations until a solution is found (I2S),

no clear answer can be given. Five out of six datasets

could not be classiﬁed as solved within its given bud-

get. Thus, they cannot be used to answer this research

question. Still, in one case, it can be seen that addi-

tional attributes do not effect CGP’s I2S.

Another research question is: Does CGP manage

to ignore redundant attributes? In some cases, CGP is

able to almost completely ignore them. This suggests

that CGP is indeed able to use only relevant informa-

tion to generate an output. However, there are also

cases where 50 % of inputs are purely noise—without

an effect on its ﬁtness value.

Furthermore, these results are valid for all levels

of additional attributes. Thus, we can conclude that

CGP is robust to additional duplicated-, duplicated

and noised-, and purely noise attributes in datasets.

These types of additional, unwanted attributes do not

affect CGP’s performance or behaviour.

As for future work, there are still various differ-

ent settings that could be examined to further investi-

gate CGP. More datasets and different types of addi-

tional attributes could be investigated. Another possi-

bility is to integrate different preprocessing methods.

A dataset can be extended with attributes that are not

ﬁltered out by said preprocessing methods. This adds

another level of difﬁculty, as the additional attributes

can truly not be distinguishable from the real data. In-

cluding such data might or might not inﬂuence CGP.

In addition, our bloated datasets should be eval-

uated with various other learning algorithms. These

results should then be compared with our ﬁndings to

show if CGP is a valuable choice on noised datasets.

ACKNOWLEDGEMENTS

The authors would like to thank the German Federal

Ministry of Education and Research (BMBF) for sup-

porting the project SaMoA within VIP+ (grant num-

ber 03VP09291).

REFERENCES

Albuquerque, I. M. R., Nguyen, B. H., Xue, B., and Zhang,

M. (2020). A novel genetic algorithm approach to si-

multaneous feature selection and instance selection. In

2020 IEEE Symposium Series on Computational Intel-

ligence (SSCI), pages 616–623.

Bidgoli, A. A., Ebrahimpour-Komleh, H., and Rahna-

mayan, S. (2019). A novel multi-objective binary

differential evolution algorithm for multi-label feature

selection. In 2019 IEEE Congress on Evolutionary

Computation (CEC), pages 1588–1595.

Cai, X., Smith, S. L., and Tyrrell, A. M. (2006). Posi-

tional independence and recombination in cartesian

genetic programming. In Collet, P., Tomassini, M.,

Ebner, M., Gustafson, S., and Ek

art, A., editors, Ge-

netic Programming, pages 351–360, Berlin, Heidel-

berg. Springer Berlin Heidelberg.

Calvo, B., Ceberio, J., and Lozano, J. A. (2018). Bayesian

inference for algorithm ranking analysis. In Proceed-

ings of the Genetic and Evolutionary Computation

Conference Companion, GECCO ’18, page 324–325,

Cartesian Genetic Programming Is Robust Against Redundant Attributes in Datasets

115

New York, NY, USA. Association for Computing Ma-

chinery.

Duangsoithong, R. and Windeatt, T. (2009). Relevant and

redundant feature analysis with ensemble classiﬁca-

tion. In 2009 Seventh International Conference on

Advances in Pattern Recognition, pages 247–250.

Fung, G., Liu, J., Chan, K., and Lau, R. (1997). Fuzzy

genetic algorithm approach to feature selection prob-

lem. In Proceedings of 6th International Fuzzy Sys-

tems Conference, volume 1, pages 441–446 vol.1.

Garc

ıa, S., Luengo, J., Herrera, F., et al. (2015). Data pre-

processing in data mining, volume 72. Springer.

Goldman, B. W. and Punch, W. F. (2013). Reducing wasted

evaluations in cartesian genetic programming. In

Krawiec, K., Moraglio, A., Hu, T., Etaner-Uyar, A. S¸.,

and Hu, B., editors, Genetic Programming, pages 61–

72. Springer Berlin Heidelberg.

Goldman, B. W. and Punch, W. F. (2015). Analysis of

cartesian genetic programming’s evolutionary mech-

anisms. 19(3):359–373.

Hall, M. A. and Smith, L. A. (1997). Feature subset selec-

tion: a correlation based ﬁlter approach.

Hern

andez, M. A. and Stolfo, S. J. (1998). Real-world data

is dirty: Data cleansing and the merge/purge problem.

Data Mining and Knowledge Discovery, 2(1):9–37.

Kalkreuth, R., Rudolph, G., and Droschinsky, A. (2017).

A new subgraph crossover for cartesian genetic pro-

gramming. In McDermott, J., Castelli, M., Sekanina,

L., Haasdijk, E., and Garc

ıa-S

anchez, P., editors, Ge-

netic Programming, pages 294–310. Springer Interna-

tional Publishing.

Kelly, M., Longjohn, R., and Nottingham, K. The UCI ma-

chine learning repository. https://archive.ics.uci.edu.

Kruschke, J. K. (2013). Bayesian estimation supersedes the

t test. Journal of Experimental Psychology: General,

142(2):573–603.

Kumar, S. and Chaurasiya, V. K. (2019). A strategy for

elimination of data redundancy in internet of things

(iot) based wireless sensor network (wsn). IEEE Sys-

tems Journal, 13(2):1650–1657.

Miller, J. and Smith, S. (2006). Redundancy and com-

putational efﬁciency in cartesian genetic program-

ming. IEEE Transactions on Evolutionary Computa-

tion, 10(2):167–174.

Miller, J. F. (1999). An empirical study of the efﬁciency of

learning boolean functions using a cartesian genetic

programming approach. In Proceedings of the 1st An-

nual Conference on Genetic and Evolutionary Com-

putation - Volume 2, GECCO’99, page 1135–1142,

San Francisco, CA, USA. Morgan Kaufmann Publish-

ers Inc.

Miller, J. F. (2011). Cartesian Genetic Programming.

Springer Berlin Heidelberg.

Miller, J. F. (2020). Cartesian genetic programming: its sta-

tus and future. Genetic Programming and Evolvable

Machines, 21(1):129–168.

atzel, D. (2023). cmpbayes. https://github.com/dpaetzel/

cmpbayes.

Rong, M., Gong, D., and Gao, X. (2019). Feature selec-

tion and its use in big data: Challenges, methods, and

trends. IEEE Access, 7:19709–19725.

Stegherr., H., Heider., M., and H

ahner., J. (2023). Assisting

convergence behaviour characterisation with unsuper-

vised clustering. In Proceedings of the 15th Inter-

national Joint Conference on Computational Intelli-

gence - ECTA, pages 108–118. INSTICC, SciTePress.

Tiwari, R. and Singh, M. P. (2010). Correlation-based at-

tribute selection using genetic algorithm. Interna-

tional Journal of Computer Applications, 4(8):28–34.

Tsai, C.-F., Eberle, W., and Chu, C.-Y. (2013). Ge-

netic algorithms in feature and instance selection.

Knowledge-Based Systems, 39:240–247.

Turner, A. J. and Miller, J. F. (2015). Neutral genetic drift:

an investigation using cartesian genetic programming.

16(4):531–558.

White, D., Mcdermott, J., Castelli, M., Manzoni, L., Gold-

man, B., Kronberger, G., Ja

skowski, W., O’Reilly, U.-

M., and Luke, S. (2013). Better gp benchmarks: Com-

munity survey results and proposals. 14:3–29.

Xu, S., Zhou, X., and Sun, Y.-n. (2009). A genetic

algorithm-based feature selection method for human

identiﬁcation based on ground reaction force. In Pro-

ceedings of the First ACM/SIGEVO Summit on Ge-

netic and Evolutionary Computation, GEC ’09, page

665–670, New York, NY, USA. Association for Com-

puting Machinery.

Yu, T. and Miller, J. (2001). Neutrality and the evolv-

ability of boolean function landscape. In Miller, J.,

Tomassini, M., Lanzi, P. L., Ryan, C., Tettamanzi, A.

G. B., and Langdon, W. B., editors, Genetic Program-

ming, pages 204–217. Springer Berlin Heidelberg.

APPENDIX

Results for Duplicated Attributes

Table 2 shows our results for all datasets when at-

tributes are duplicated, according to Algorithm 1.

Results for Duplicated and Noised

Attributes

Table 3 shows our results for all datasets when at-

tributes are duplicated and then noised, according to

Algorithm 2.

Results for Adding Pure Noise

Table 4 shows our results for all datasets when the

additional attributes are pure noise, according to Al-

gorithm 3.

ECTA 2024 - 16th International Conference on Evolutionary Computation Theory and Applications

116

Table 2: Our results for all datasets when attributes are duplicated. We show the percentage of additional attributes (% Add.),

the number of nodes (Nodes), number of mean active nodes (Active), the mean and standard deviation of iterations until a

dataset is solved or stopped (I2S Mean ± Std), the mean and standard deviation of achieved test ﬁtness (Fit Mean ± Std), the

mean percentage of redundant attributes that are used to generate an output (% Red), and the probability of a solution being

the best per dataset with respect to its test ﬁtness (p(best)). Results are ranked according to p(best).

Dataset % Add. Nodes Active I2S Mean ± Std Fit Mean ± Std % Red. p(best)

Abalone

60 1,800 371 100k ± 0k 0.15 ± 0.02 0.28 0.2

Baseline 900 289 100k ± 0k 0.14 ± 0.02 - 0.19

80 2,000 395 100k ± 0k 0.14 ± 0.02 0.21 0.19

100 1,500 352 100k ± 0k 0.14 ± 0.02 0.17 0.15

40 1,850 381 100k ± 0k 0.14 ± 0.02 0.27 0.15

20 1,400 331 100k ± 0k 0.14 ± 0.03 0.56 0.12

Cancer

20 1,400 49 72k ± 36k 0.95 ± 0.02 0.07 0.2

Baseline 850 42 70k ± 31k 0.95 ± 0.02 - 0.19

40 1,400 47 74k ± 33k 0.94 ± 0.03 0.03 0.17

100 1,450 42 80k ± 29k 0.94 ± 0.02 0.01 0.15

60 1,700 39 85k ± 27k 0.94 ± 0.02 0.02 0.14

80 950 38 79k ± 30k 0.95 ± 0.02 0.01 0.14

Credit

20 1,200 65 100k ± 0k 0.86 ± 0.02 0.25 0.23

40 1,600 71 100k ± 0k 0.86 ± 0.03 0.13 0.19

80 1,150 68 100k ± 0k 0.85 ± 0.03 0.06 0.17

Baseline 1,900 68 100k ± 0k 0.85 ± 0.03 - 0.15

60 1,050 75 100k ± 0k 0.85 ± 0.02 0.09 0.13

100 1,650 73 100k ± 0k 0.85 ± 0.03 0.06 0.12

Page Blocks

60 950 93 100k ± 0k 0.72 ± 0.04 0.18 0.19

Baseline 1,500 96 100k ± 0k 0.72 ± 0.03 - 0.18

100 1,700 98 100k ± 0k 0.73 ± 0.04 0.12 0.18

20 1,400 99 100k ± 0k 0.72 ± 0.05 0.4 0.18

40 1,000 89 100k ± 0k 0.72 ± 0.04 0.27 0.14

80 1,600 100 100k ± 0k 0.72 ± 0.03 0.14 0.13

Shuttle

100 1,250 110 100k ± 0k 0.82 ± 0.04 0.12 0.2

40 300 78 100k ± 0k 0.82 ± 0.06 0.2 0.18

Baseline 1,550 116 100k ± 0k 0.82 ± 0.06 - 0.18

60 1,300 109 100k ± 0k 0.81 ± 0.06 0.15 0.17

20 550 89 100k ± 0k 0.81 ± 0.06 0.42 0.15

80 1,650 124 100k ± 0k 0.81 ± 0.06 0.14 0.12

Waveform

Baseline 1,650 50 100k ± 0k 0.6 ± 0.01 - 0.19

40 600 40 100k ± 0k 0.6 ± 0.01 0.05 0.19

100 1,450 42 100k ± 0k 0.6 ± 0.01 0.03 0.18

60 950 38 100k ± 0k 0.6 ± 0.01 0.03 0.16

20 1,750 48 100k ± 0k 0.6 ± 0.01 0.1 0.16

80 950 40 100k ± 0k 0.6 ± 0.01 0.03 0.12

Cartesian Genetic Programming Is Robust Against Redundant Attributes in Datasets

117

Table 3: Our results for all datasets when attributes are duplicated and then noised. We show the percentage of additional

attributes (% Add.), the number of nodes (Nodes), number of mean active nodes (Active), the mean and standard deviation

of iterations until a dataset is solved or stopped (I2S Mean ± Std), the mean and standard deviation of achieved test ﬁtness

(Fit Mean ± Std), the mean percentage of redundant attributes that are used to generate an output (% Red), and the probability

of a solution being the best per dataset with respect to its test ﬁtness (p(best)). Results are ranked according to % Add.

Dataset % Add. Nodes Active I2S Mean ± Std Fit Mean ± Std % Red. p(best)

Abalone

100 1,750 369 100k ± 0k 0.14 ± 0.02 0.2 0.22

60 2,000 401 100k ± 0k 0.14 ± 0.03 0.25 0.18

Baseline 900 289 100k ± 0k 0.14 ± 0.02 - 0.17

20 1,150 328 100k ± 0k 0.14 ± 0.02 0.47 0.15

40 1,400 345 100k ± 0k 0.14 ± 0.02 0.27 0.15

80 1,250 327 100k ± 0k 0.13 ± 0.03 0.19 0.14

Cancer

Baseline 850 42 70k ± 31k 0.95 ± 0.02 - 0.2

100 1,050 34 79k ± 32k 0.95 ± 0.02 0.01 0.18

40 1,200 38 74k ± 32k 0.95 ± 0.02 0.03 0.18

20 550 34 70k ± 32k 0.95 ± 0.02 0.05 0.17

80 950 37 79k ± 30k 0.95 ± 0.02 0.01 0.16

60 1,500 41 79k ± 28k 0.94 ± 0.03 0.02 0.11

Credit

100 1,400 64 100k ± 0k 0.87 ± 0.02 0.05 0.26

20 1,150 63 100k ± 0k 0.86 ± 0.02 0.2 0.18

Baseline 1,900 68 100k ± 0k 0.85 ± 0.03 - 0.15

40 1,350 68 100k ± 0k 0.85 ± 0.03 0.12 0.14

60 1,500 66 100k ± 0k 0.85 ± 0.03 0.08 0.14

80 1,750 71 100k ± 0k 0.85 ± 0.03 0.06 0.12

Page Blocks

80 1,100 86 100k ± 0k 0.73 ± 0.03 0.13 0.21

20 1,900 110 100k ± 0k 0.72 ± 0.04 0.36 0.18

40 1,050 88 100k ± 0k 0.73 ± 0.04 0.2 0.18

Baseline 1,500 96 100k ± 0k 0.72 ± 0.03 - 0.16

100 1,900 114 100k ± 0k 0.72 ± 0.04 0.11 0.13

60 1,150 87 100k ± 0k 0.72 ± 0.04 0.16 0.13

Shuttle

20 200 66 100k ± 0k 0.82 ± 0.05 0.26 0.21

80 750 96 100k ± 0k 0.83 ± 0.04 0.11 0.21

40 1,650 120 100k ± 0k 0.81 ± 0.05 0.26 0.15

60 1,950 127 100k ± 0k 0.82 ± 0.08 0.14 0.15

Baseline 1,550 116 100k ± 0k 0.82 ± 0.06 - 0.14

100 1,750 129 100k ± 0k 0.81 ± 0.06 0.12 0.14

Waveform

20 600 37 100k ± 0k 0.6 ± 0.01 0.07 0.19

80 650 31 100k ± 0k 0.6 ± 0.01 0.02 0.19

40 1,150 48 100k ± 0k 0.6 ± 0.01 0.05 0.18

Baseline 1,650 50 100k ± 0k 0.6 ± 0.01 - 0.17

100 1,400 38 100k ± 0k 0.6 ± 0.01 0.02 0.11

60 500 32 100k ± 0k 0.6 ± 0.01 0.03 0.15

ECTA 2024 - 16th International Conference on Evolutionary Computation Theory and Applications

118

Table 4: Our results for all datasets when the additional attributes are pure noise. We show the percentage of additional

attributes (% Add.), the number of nodes (Nodes), number of mean active nodes (Active), the mean and standard deviation

of iterations until a dataset is solved or stopped (I2S Mean ± Std), the mean and standard deviation of achieved test ﬁtness

(Fit Mean ± Std), the mean percentage of redundant attributes that are used to generate an output (% Red), and the probability

of a solution being the best per dataset with respect to its test ﬁtness (p(best)). Results are ranked according to % Add.

Dataset % Add. Nodes Active I2S Mean ± Std Fit Mean ± Std % Red. p(best)

Abalone

Baseline 900 289 100k ± 0k 0.14 ± 0.02 - 0.27

20 700 260 100k ± 0k 0.13 ± 0.02 0.63 0.24

40 400 207 100k ± 0k 0.13 ± 0.03 0.76 0.19

60 1,050 301 100k ± 0k 0.12 ± 0.03 0.8 0.15

80 950 284 100k ± 0k 0.11 ± 0.03 0.84 0.09

100 2,000 396 100k ± 0k 0.11 ± 0.03 0.86 0.07

Cancer

Baseline 850 42 70k ± 31k 0.95 ± 0.02 - 0.26

40 1,500 43 85k ± 25k 0.94 ± 0.02 0.2 0.19

20 1,400 42 79k ± 29k 0.94 ± 0.03 0.2 0.18

60 800 41 81k ± 27k 0.94 ± 0.02 0.2 0.14

80 1,600 41 85k ± 27k 0.94 ± 0.03 0.18 0.13

100 1,600 43 91k ± 19k 0.94 ± 0.03 0.16 0.1

Credit

20 1,750 78 100k ± 0k 0.85 ± 0.03 0.53 0.21

Baseline 1,900 68 100k ± 0k 0.85 ± 0.03 - 0.19

100 1,200 72 100k ± 0k 0.85 ± 0.04 0.5 0.18

40 1,250 74 100k ± 0k 0.85 ± 0.03 0.53 0.16

60 1,000 75 100k ± 0k 0.85 ± 0.02 0.55 0.15

80 1,300 71 100k ± 0k 0.84 ± 0.03 0.5 0.12

Page Blocks

80 1,100 83 100k ± 0k 0.72 ± 0.04 0.24 0.2

Baseline 1,500 96 100k ± 0k 0.72 ± 0.03 - 0.18

20 1,850 103 100k ± 0k 0.72 ± 0.04 0.29 0.17

60 1,550 96 100k ± 0k 0.72 ± 0.04 0.27 0.17

100 1,550 91 100k ± 0k 0.71 ± 0.05 0.3 0.14

40 2,000 106 100k ± 0k 0.71 ± 0.05 0.31 0.13

Shuttle

100 1,600 108 100k ± 0k 0.83 ± 0.06 0.13 0.19

20 250 71 100k ± 0k 0.83 ± 0.05 0.08 0.18

40 250 68 100k ± 0k 0.84 ± 0.06 0.08 0.17

60 800 93 100k ± 0k 0.82 ± 0.05 0.13 0.16

80 950 94 100k ± 0k 0.83 ± 0.05 0.13 0.16

Baseline 1,550 116 100k ± 0k 0.82 ± 0.06 - 0.13

Waveform

40 1,100 43 100k ± 0k 0.6 ± 0.01 0.19 0.25

Baseline 1,650 50 100k ± 0k 0.6 ± 0.01 - 0.19

80 1,500 44 100k ± 0k 0.6 ± 0.01 0.17 0.16

60 1,450 40 100k ± 0k 0.6 ± 0.01 0.16 0.14

100 500 31 100k ± 0k 0.6 ± 0.01 0.11 0.14

20 1,150 42 100k ± 0k 0.6 ± 0.01 0.16 0.13

Cartesian Genetic Programming Is Robust Against Redundant Attributes in Datasets

119