THE ROLE OF KEEPING “SEMANTIC BLOCKS” INVARIANT

Effects in Linear Genetic Programming Performance

Marina de la Cruz Echeand´ıa, Alba Mart´ın L´azaro, Alfonso Ortega de la Puente

Departamento de Ingenier´ıa Inform´atica, Escuela Polit´ectnica Superior

Universidad Aut´onoma, Madrid, Spain

Jos´e Luis Monta˜na Arn´aiz

Departamento de Matem´aticas, Estad´ıstica y Computaci´on, Facultad de Ciencias

Universidad de Cantabria, Santander, Spain

C´esar L. Alonso

Centro de Inteligencia Artiﬁcial, Universidad de Oviedo, Gij´on, Spain

Keywords:

Grammar evolution, Attribute grammars, Christiansen grammars, Genetic programming, Straight-line pro-

grams, symbolic regression.

Abstract:

This paper is focused on two different approaches (previously proposed by the authors) that perform better than

Genetic Programming in typical symbolic regression problems: straight-line program genetic programming

(SLP-GP) and evolution with attribute grammars (AGE). Both approaches have different characteristics. One

of the most important is that SLP-GP keeps semantic blocks invariant (the crossover operator always exchanges

complete subexpressions). In this paper we compare both methods and study the possible effect on their

performance of keeping these blocks invariant.

1 MOTIVATION

The so called Holland schema theorem (Holland92,

1975) explains the reasons why genetic based meth-

ods guarantee to ﬁnd quasi-optimal solutions. It

showshow some relevant patterns (schema) that could

be associated with higher ﬁtness values, are kept af-

ter applying genetic operatorsby the ﬁtter individuals.

Keeping these patterns is a way to improvethe ﬁtness.

These algorithms are supposed to use binary geno-

types to which the ﬁtness function is applied. Never-

theless, genetic programming techniques usually code

the ﬁnal programs in a more complex way, because

it is more difﬁcult to translate genotypes into pheno-

types if they have to meet the constrains of the pro-

graming language under consideration. Genetic pro-

gramming frequently uses tree-like data structures,

like the original GP proposal (Koza, 1992) or non-

binary strings, like Grammatical Evolution and its

derivatives (ONeill and Conor, 2003; de la Cruz et al.,

2005; Ortega et al., 2007) to code their genotypes.

When the mapping from genotype to phenotype is

complex, it is more difﬁcult to trace or even identify

the schema.

In this paper we will use the name “semantic

blocks” for those genotype fragments associated with

some speciﬁc meaning in the phenotype. For exam-

ple, given an assignment of some arithmetic subex-

pression to some variable, such as z = x + 3 − y, a

semantic block could give a proper value to its vari-

ables, such as x = 3;y = x− 2.

This paper is focused on the possible improvement

in the performance of some genetic programming al-

gorithms that could be caused by genetic operators

that keep these semantic blocks invariant.

The paper compares the performance of two dif-

ferent approaches previously proposed by the authors:

attribute grammar evolution (AGE) (de la Cruz et al.,

2005) and genetic programing based on straight line

programs (SPL-GP) (Alonso et al., 2008) Both meth-

ods have proven to improve the efﬁciency of clas-

sic GP and incorporate complex formal represen-

365

de la Cruz Echeandía M., Martín Lázaro A., Ortega de la Puente A., Luis Montaña Arnáiz J. and L. Alonso C..

THE ROLE OF KEEPING “SEMANTIC BLOCKS” INVARIANT - Effects in Linear Genetic Programming Performance.

DOI: 10.5220/0003085403650368

In Proceedings of the International Conference on Evolutionary Computation (ICEC-2010), pages 365-368

ISBN: 978-989-8425-31-7

 2010 SCITEPRESS (Science and Technology Publications, Lda.)

tations for the candidate solutions (respectively at-

tribute grammars and straight line programs). In ad-

dition, SPL-GP uses genetic operators that preserve

the structure of the “semantic blocks”. Even although

preserving semantic blocks is not the only difference

between AGE and SPL-GP, we think that a ﬁrst com-

parison of both methods as they are deﬁned could pro-

vide some information about the effect on the perfor-

mance of keeping semantic blocks invariant. One of

our future goals is to deﬁne an AGE extension adding

a mechanism able to keep the semantic blocks in-

spired by SPL-GP.

2 INTRODUCTION TO SLP-GP

It is easy to see that any program (algorithm) can be

expressed as a sequence of predeﬁned atomic oper-

ations. A standard way of representing each opera-

tion is by means of vectors with n + 2 components.

The ﬁrst component usually holds the operation per-

formed. Another element refers to the place where the

result will be stored (it usually is a different and new

variable for each operation). There are n additional

components for a maximum of n operands.

Quadruples belong to this kind of representations

(n+ 2 vectors with four elements, n = 2).

For example, the expression 2x

+ 1)

− 2x

could be represented by the following sequence of

quadruples: { u

:= x

+ 1 , u

:= u

∗ u

, u

+ x

, u

:= u

∗ u

, u

:= u

− u

}

It is easy to see that the semantic block of a given

element (quadruple) contains all the elements of the

SLP needed to properly evaluate it. For example: the

semantic block of the second quadruple (u

:= u

∗u

)

is the sequence { u

:= x

+ 1 , u

:= u

∗ u

}, while

the semantic block of the third element (u

:= x

)

contains only this quadruple and the semantic block

of the last quadruple is the complete SLP.

Further details of SLP-GP and examples of its

crossover operator can be found in (Alonso et al.,

2008). The reader can ﬁnd there a more formal ap-

proach and deﬁnitions of slp’s.

3 INTRODUCTION TO AGE

Attribute grammars (Knuth, 1968) are one of the tools

used to completely describe high level programming

languages (both their syntax and their semantics).

AGE (de la Cruz et al., 2005) is an extension to

Grammatical Evolution (ONeill and Conor, 2003).

Both techniques are automatic programming evolu-

tionary algorithms independent of the target program-

ming language, and include a standard representation

of genotypes as strings of integers (codons), and a for-

mal grammar (respectively attribute and context free

grammars) as inputs for the deterministic mapping of

a genotype into a phenotype. This mapping mini-

mizes the generation of syntactically (and in the case

of AGE also semantically) invalid phenotypes. Ge-

netic operators act at the genotype level, while the ﬁt-

ness function is evaluated on the phenotypes.

Further details, deeper descriptions and examples

can be found in (de la Cruz et al., 2005).

4 SLP-GP VS. AGE

There are several differences that make it difﬁcult to

compare both methods. We can brieﬂy summarize

them as follows

• Length of Genotypes. SLP-GP uses slp’s

of “length” 12 (with 12 operations or instruc-

tions) while AGE uses genotypes with variable

length. Like other variable length evolutionary

algorithms, AGE suffers from bloating (the un-

bounded increase in the length of the genotypes

while the search evolves). Pruning (removing

some fragments of genotypes) is a genetic oper-

ator useful to control bloating. AGE removes the

codons not used after the translation into pheno-

type.

• Mapping from Genotype to Phenotype. AGE

includes a mapping from genotype to phenotype

driven by the attribute grammar that generates the

language of candidate solutions. SLP-GP uses

slp’s both as genotypes and as phenotypes. In

AGE, the number of codons (length of the corre-

sponding genotype) needed to generate a pheno-

type of a given length depends on the structure of

the speciﬁc grammar and it is, consequently, more

difﬁcult to estimate.

• The Way in which Computational Effort is

Computed. In (Alonso et al., 2008) the number

of basic operations is used to measure the com-

putational effort. Each slp has 12 operations and

each generation contains 200 different genotypes.

Therefore, SLP-GP could compute the computa-

tional effort either by means of the total number

of basic operations, candidate solutions or gener-

ations needed to ﬁnd the solution, because all of

them are directly proportional. In (Alonso et al.,

2008) the maximum number of basic operation is

. This condition is equivalentto a maximum of

5000 generations. Nevertheless, as we have pre-

viously introduced for AGE, it is difﬁcult to es-

ICEC 2010 - International Conference on Evolutionary Computation

366

timate how many codons are needed to generate

a phenotype with a given number of basic oper-

ations. Instead of the total number of basic op-

erations performed, we have used the cumulative

success frequency taking into account the genera-

tion in which the successful experiments ﬁnd the

solution.

4.1 Experimental Setting

In order to compare SLP-GP and AGE we have re-

produced the experiments performed in (Alonso et al.,

2008) for the following real functions: f

(x) = x

+ x

+ x, f

(x) = e

−sin3x+2x

, f

(x) = 2.718x

3.1416x, f

(x) = min{

,sin(x) + 1}.

We have used the same parameters when possi-

ble: 30 sample points, 20 individuals in the popula-

tion, functions set F = {+,−,∗,/

(∗)

}, constants set

{0,1,2}, crossover rate = 0.9, 100 runs per function,

5000 generation at maximum, and error thresholds

3.716070e−07, 7.623910e−01, 5.265160e−01 and

1.425200e− 02 for the solutions respectively of f

, f

and f

. Sample points have been respectively

taken from [−5, 5],



−



, [−π,π], and [0, 15].

The following additional functions have been respec-

tively added to F: {sqrt

(∗)

}, {sqrt

(∗)

,sin,cos,exp},

{sin,cos}, and {sin,cos} (

(∗)

used only by SLP-GP).

There is a signiﬁcant difference between the set of

functions used by both methods. /

(∗)

and sqrt

(∗)

rep-

resent protected versions (against undeﬁned results)

of, respectively, / and sqrt. These versions return

1 when they are undeﬁned (x = 0 and x < 0 respec-

tively). AGE considers semantically incorrect those

individuals with undeﬁned subexpressions and, sub-

sequently, AGE never generates them.

In addition, protected functions change the behav-

ior of the ﬁnal expressions and make them difﬁcult to

understand and describe.

The remaining parameters were tuned separately.

For SLP-GP: mutation and reproduction rates = 0.05

and slp’s length = 12. For AGE: mutation rate = 0.9

and variable length genotypes.

5 EXPERIMENTAL RESULTS

It is possible to compare the performance of both

methods, if we do not take into account the number

of basic operations performed to get the solutions.

We will show two kinds of graphics:

1. The cumulative success frequency with respect to

the generation in which the quasi-optimal solu-

tion is found (ﬁgure 1). It shows how many runs

successfully ﬁnish and with which speed. Some

Figure 1: Cumulative success frequency for f1, f 2, f3, and

f4.

of the curves does not show 5000 generations be-

cause they are focused on the range in which most

of the solutions are found

2. The empirical distribution of the best ﬁtness dis-

played using the standard box plot notation with

marks at best execution, 25, 50, 75 per cent, and

worst execution (ﬁgure 2). We consider that it

globally describes the quality of the populations.

It can be easily seen that, except for f1, SPL-GP

performs better than AGE, specially for function f4.

The poor performanceof AGE for more difﬁcult cases

could be caused by the capability of SPL-GP for pre-

THE ROLE OF KEEPING "SEMANTIC BLOCKS" INVARIANT - Effects in Linear Genetic Programming Performance

367

Figure 2: Empirical best ﬁtness distribution for f2, f3, and

f4.

serving semantics blocks. No ﬁgure shows the em-

pirical distribution for f1 because both methods gets

ﬁtness values above 0.9999995 for all the runs. It is

also worth mentioning that AGE is able to get the ex-

act (zero error) target function in some runs for almost

all the cases ( f1, f2 and f3) while SLP-GP only ap-

proximates them.

6 CONCLUSIONS AND FURTHER

RESEARCH LINES

We have compared two different genetic program-

ming algorithms previously proposed by the authors.

From our point of view, the main difference between

these methods is that SLP-GP implements a crossover

operator that keeps semantic blocks invariant.

We have designed a set of graphics to compare

the performance of both methods, taking into account

their differences. We conclude that SLP-GP perfor-

mance is better. We think that this is a consequence

of keeping semantic blocks invariant.

In the future we would like to gather a wider set

of performance data using different target functions to

get a deeper comparison between SPL-GP and AGE.

We would like to propose a new version of AGE and

CGE that extends crossover to keep semantic blocks

inviriant, as in SPL-GP. We will then compare SPL-

GP with the new algorithm.

ACKNOWLEDGEMENTS

This work was partially supported by the R&D pro-

gram of the Community of Madrid (S2009/TIC-1650,

project “e-Madrid”) as well as by the Spanish Min-

istry of Science and Innovation (TIN2007-67466-

C02-02). The authors thank Dr. Manuel Alfonseca

for his help to prepare this document.

REFERENCES

Alonso, C. L., Monta˜na, J. L., and Puente, J. (2008).

Straight line programs: a new linear genetic program-

ming approach. Proc. 20th IEEE International Con-

ference on Tools with Artiﬁcial Intelligence (IC-TAI),

pages 517–524.

de la Cruz, M., Ortega, A., and Alfonseca, M. (2005). At-

tribute grammar evolution. In Mira, J. and ’Alvarez,

J., editors, Artiﬁcial Intelligence and Knowledge En-

gineering Applications: A Bioinspired Approach, vol-

ume 3562 of Lecture Notes in Computer Science,

pages 182–191, Berlin / Heidelberg. Springer.

Holland92 (1992 (originally published in 1975)). Adapta-

tion in Natural and Artiﬁcial Systems. The MIT Press,

London, 2nd edition.

Knuth, D. E. (1968). Semantics of Context-Free Languages.

Mathematical Systems Theory, vol. 2, n 2, pp. 127-

145.

Koza, J. (1992). Genetic programming: on the program-

ming of computers by means of natural selection. MIT

Press, Cambridge.

ONeill, M. and Conor, R. (2003). Grammatical Evolution,

evolutionary automatic programming in an arbitrary

language. Kluwer Academic Phblishers.

Ortega, A., de la Cruz, M., and Alfonseca, M. (2007).

Christiansen grammar evolution: Grammatical evolu-

tion with semantics. IEEE Transactions on Evolution-

ary Computation, 11(1):77–90.

ICEC 2010 - International Conference on Evolutionary Computation

368