THE ROLE OF KEEPING “SEMANTIC BLOCKS” INVARIANT
Effects in Linear Genetic Programming Performance
Marina de la Cruz Echeand´ıa, Alba Mart´ın L´azaro, Alfonso Ortega de la Puente
Departamento de Ingenier´ıa Inform´atica, Escuela Polit´ectnica Superior
Universidad Aut´onoma, Madrid, Spain
Jos´e Luis Monta˜na Arn´aiz
Departamento de Matem´aticas, Estad´ıstica y Computaci´on, Facultad de Ciencias
Universidad de Cantabria, Santander, Spain
C´esar L. Alonso
Centro de Inteligencia Artificial, Universidad de Oviedo, Gij´on, Spain
Keywords:
Grammar evolution, Attribute grammars, Christiansen grammars, Genetic programming, Straight-line pro-
grams, symbolic regression.
Abstract:
This paper is focused on two different approaches (previously proposed by the authors) that perform better than
Genetic Programming in typical symbolic regression problems: straight-line program genetic programming
(SLP-GP) and evolution with attribute grammars (AGE). Both approaches have different characteristics. One
of the most important is that SLP-GP keeps semantic blocks invariant (the crossover operator always exchanges
complete subexpressions). In this paper we compare both methods and study the possible effect on their
performance of keeping these blocks invariant.
1 MOTIVATION
The so called Holland schema theorem (Holland92,
1975) explains the reasons why genetic based meth-
ods guarantee to find quasi-optimal solutions. It
showshow some relevant patterns (schema) that could
be associated with higher fitness values, are kept af-
ter applying genetic operatorsby the fitter individuals.
Keeping these patterns is a way to improvethe fitness.
These algorithms are supposed to use binary geno-
types to which the fitness function is applied. Never-
theless, genetic programming techniques usually code
the final programs in a more complex way, because
it is more difficult to translate genotypes into pheno-
types if they have to meet the constrains of the pro-
graming language under consideration. Genetic pro-
gramming frequently uses tree-like data structures,
like the original GP proposal (Koza, 1992) or non-
binary strings, like Grammatical Evolution and its
derivatives (ONeill and Conor, 2003; de la Cruz et al.,
2005; Ortega et al., 2007) to code their genotypes.
When the mapping from genotype to phenotype is
complex, it is more difficult to trace or even identify
the schema.
In this paper we will use the name “semantic
blocks” for those genotype fragments associated with
some specific meaning in the phenotype. For exam-
ple, given an assignment of some arithmetic subex-
pression to some variable, such as z = x + 3 y, a
semantic block could give a proper value to its vari-
ables, such as x = 3;y = x 2.
This paper is focused on the possible improvement
in the performance of some genetic programming al-
gorithms that could be caused by genetic operators
that keep these semantic blocks invariant.
The paper compares the performance of two dif-
ferent approaches previously proposed by the authors:
attribute grammar evolution (AGE) (de la Cruz et al.,
2005) and genetic programing based on straight line
programs (SPL-GP) (Alonso et al., 2008) Both meth-
ods have proven to improve the efficiency of clas-
sic GP and incorporate complex formal represen-
365
de la Cruz Echeandía M., Martín Lázaro A., Ortega de la Puente A., Luis Montaña Arnáiz J. and L. Alonso C..
THE ROLE OF KEEPING “SEMANTIC BLOCKS” INVARIANT - Effects in Linear Genetic Programming Performance.
DOI: 10.5220/0003085403650368
In Proceedings of the International Conference on Evolutionary Computation (ICEC-2010), pages 365-368
ISBN: 978-989-8425-31-7
Copyright
c
2010 SCITEPRESS (Science and Technology Publications, Lda.)
tations for the candidate solutions (respectively at-
tribute grammars and straight line programs). In ad-
dition, SPL-GP uses genetic operators that preserve
the structure of the “semantic blocks”. Even although
preserving semantic blocks is not the only difference
between AGE and SPL-GP, we think that a first com-
parison of both methods as they are defined could pro-
vide some information about the effect on the perfor-
mance of keeping semantic blocks invariant. One of
our future goals is to define an AGE extension adding
a mechanism able to keep the semantic blocks in-
spired by SPL-GP.
2 INTRODUCTION TO SLP-GP
It is easy to see that any program (algorithm) can be
expressed as a sequence of predefined atomic oper-
ations. A standard way of representing each opera-
tion is by means of vectors with n + 2 components.
The first component usually holds the operation per-
formed. Another element refers to the place where the
result will be stored (it usually is a different and new
variable for each operation). There are n additional
components for a maximum of n operands.
Quadruples belong to this kind of representations
(n+ 2 vectors with four elements, n = 2).
For example, the expression 2x
2
(x
1
+ 1)
2
2x
2
could be represented by the following sequence of
quadruples: { u
1
:= x
1
+ 1 , u
2
:= u
1
u
1
, u
3
:=
x
2
+ x
2
, u
4
:= u
2
u
3
, u
5
:= u
4
u
3
}
It is easy to see that the semantic block of a given
element (quadruple) contains all the elements of the
SLP needed to properly evaluate it. For example: the
semantic block of the second quadruple (u
2
:= u
1
u
1
)
is the sequence { u
1
:= x
1
+ 1 , u
2
:= u
1
u
1
}, while
the semantic block of the third element (u
3
:= x
2
+x
2
)
contains only this quadruple and the semantic block
of the last quadruple is the complete SLP.
Further details of SLP-GP and examples of its
crossover operator can be found in (Alonso et al.,
2008). The reader can find there a more formal ap-
proach and definitions of slp’s.
3 INTRODUCTION TO AGE
Attribute grammars (Knuth, 1968) are one of the tools
used to completely describe high level programming
languages (both their syntax and their semantics).
AGE (de la Cruz et al., 2005) is an extension to
Grammatical Evolution (ONeill and Conor, 2003).
Both techniques are automatic programming evolu-
tionary algorithms independent of the target program-
ming language, and include a standard representation
of genotypes as strings of integers (codons), and a for-
mal grammar (respectively attribute and context free
grammars) as inputs for the deterministic mapping of
a genotype into a phenotype. This mapping mini-
mizes the generation of syntactically (and in the case
of AGE also semantically) invalid phenotypes. Ge-
netic operators act at the genotype level, while the fit-
ness function is evaluated on the phenotypes.
Further details, deeper descriptions and examples
can be found in (de la Cruz et al., 2005).
4 SLP-GP VS. AGE
There are several differences that make it difficult to
compare both methods. We can briefly summarize
them as follows
Length of Genotypes. SLP-GP uses slp’s
of “length 12 (with 12 operations or instruc-
tions) while AGE uses genotypes with variable
length. Like other variable length evolutionary
algorithms, AGE suffers from bloating (the un-
bounded increase in the length of the genotypes
while the search evolves). Pruning (removing
some fragments of genotypes) is a genetic oper-
ator useful to control bloating. AGE removes the
codons not used after the translation into pheno-
type.
Mapping from Genotype to Phenotype. AGE
includes a mapping from genotype to phenotype
driven by the attribute grammar that generates the
language of candidate solutions. SLP-GP uses
slp’s both as genotypes and as phenotypes. In
AGE, the number of codons (length of the corre-
sponding genotype) needed to generate a pheno-
type of a given length depends on the structure of
the specific grammar and it is, consequently, more
difficult to estimate.
The Way in which Computational Effort is
Computed. In (Alonso et al., 2008) the number
of basic operations is used to measure the com-
putational effort. Each slp has 12 operations and
each generation contains 200 different genotypes.
Therefore, SLP-GP could compute the computa-
tional effort either by means of the total number
of basic operations, candidate solutions or gener-
ations needed to find the solution, because all of
them are directly proportional. In (Alonso et al.,
2008) the maximum number of basic operation is
10
7
. This condition is equivalentto a maximum of
5000 generations. Nevertheless, as we have pre-
viously introduced for AGE, it is difficult to es-
ICEC 2010 - International Conference on Evolutionary Computation
366
timate how many codons are needed to generate
a phenotype with a given number of basic oper-
ations. Instead of the total number of basic op-
erations performed, we have used the cumulative
success frequency taking into account the genera-
tion in which the successful experiments find the
solution.
4.1 Experimental Setting
In order to compare SLP-GP and AGE we have re-
produced the experiments performed in (Alonso et al.,
2008) for the following real functions: f
1
(x) = x
4
+
x
3
+ x
2
+ x, f
2
(x) = e
sin3x+2x
, f
3
(x) = 2.718x
2
+
3.1416x, f
4
(x) = min{
2
x
,sin(x) + 1}.
We have used the same parameters when possi-
ble: 30 sample points, 20 individuals in the popula-
tion, functions set F = {+,,,/
()
}, constants set
{0,1,2}, crossover rate = 0.9, 100 runs per function,
5000 generation at maximum, and error thresholds
3.716070e07, 7.623910e01, 5.265160e01 and
1.425200e 02 for the solutions respectively of f
1
,
f
2
, f
3
and f
4
. Sample points have been respectively
taken from [5, 5],
π
2
,
π
2
, [π,π], and [0, 15].
The following additional functions have been respec-
tively added to F: {sqrt
()
}, {sqrt
()
,sin,cos,exp},
{sin,cos}, and {sin,cos} (
()
used only by SLP-GP).
There is a significant difference between the set of
functions used by both methods. /
()
and sqrt
()
rep-
resent protected versions (against undefined results)
of, respectively, / and sqrt. These versions return
1 when they are undefined (x = 0 and x < 0 respec-
tively). AGE considers semantically incorrect those
individuals with undefined subexpressions and, sub-
sequently, AGE never generates them.
In addition, protected functions change the behav-
ior of the final expressions and make them difficult to
understand and describe.
The remaining parameters were tuned separately.
For SLP-GP: mutation and reproduction rates = 0.05
and slp’s length = 12. For AGE: mutation rate = 0.9
and variable length genotypes.
5 EXPERIMENTAL RESULTS
It is possible to compare the performance of both
methods, if we do not take into account the number
of basic operations performed to get the solutions.
We will show two kinds of graphics:
1. The cumulative success frequency with respect to
the generation in which the quasi-optimal solu-
tion is found (figure 1). It shows how many runs
successfully finish and with which speed. Some
Figure 1: Cumulative success frequency for f1, f 2, f3, and
f4.
of the curves does not show 5000 generations be-
cause they are focused on the range in which most
of the solutions are found
2. The empirical distribution of the best fitness dis-
played using the standard box plot notation with
marks at best execution, 25, 50, 75 per cent, and
worst execution (figure 2). We consider that it
globally describes the quality of the populations.
It can be easily seen that, except for f1, SPL-GP
performs better than AGE, specially for function f4.
The poor performanceof AGE for more difficult cases
could be caused by the capability of SPL-GP for pre-
THE ROLE OF KEEPING "SEMANTIC BLOCKS" INVARIANT - Effects in Linear Genetic Programming Performance
367
Figure 2: Empirical best fitness distribution for f2, f3, and
f4.
serving semantics blocks. No figure shows the em-
pirical distribution for f1 because both methods gets
fitness values above 0.9999995 for all the runs. It is
also worth mentioning that AGE is able to get the ex-
act (zero error) target function in some runs for almost
all the cases ( f1, f2 and f3) while SLP-GP only ap-
proximates them.
6 CONCLUSIONS AND FURTHER
RESEARCH LINES
We have compared two different genetic program-
ming algorithms previously proposed by the authors.
From our point of view, the main difference between
these methods is that SLP-GP implements a crossover
operator that keeps semantic blocks invariant.
We have designed a set of graphics to compare
the performance of both methods, taking into account
their differences. We conclude that SLP-GP perfor-
mance is better. We think that this is a consequence
of keeping semantic blocks invariant.
In the future we would like to gather a wider set
of performance data using different target functions to
get a deeper comparison between SPL-GP and AGE.
We would like to propose a new version of AGE and
CGE that extends crossover to keep semantic blocks
inviriant, as in SPL-GP. We will then compare SPL-
GP with the new algorithm.
ACKNOWLEDGEMENTS
This work was partially supported by the R&D pro-
gram of the Community of Madrid (S2009/TIC-1650,
project “e-Madrid”) as well as by the Spanish Min-
istry of Science and Innovation (TIN2007-67466-
C02-02). The authors thank Dr. Manuel Alfonseca
for his help to prepare this document.
REFERENCES
Alonso, C. L., Monta˜na, J. L., and Puente, J. (2008).
Straight line programs: a new linear genetic program-
ming approach. Proc. 20th IEEE International Con-
ference on Tools with Artificial Intelligence (IC-TAI),
pages 517–524.
de la Cruz, M., Ortega, A., and Alfonseca, M. (2005). At-
tribute grammar evolution. In Mira, J. and ’Alvarez,
J., editors, Artificial Intelligence and Knowledge En-
gineering Applications: A Bioinspired Approach, vol-
ume 3562 of Lecture Notes in Computer Science,
pages 182–191, Berlin / Heidelberg. Springer.
Holland92 (1992 (originally published in 1975)). Adapta-
tion in Natural and Artificial Systems. The MIT Press,
London, 2nd edition.
Knuth, D. E. (1968). Semantics of Context-Free Languages.
Mathematical Systems Theory, vol. 2, n 2, pp. 127-
145.
Koza, J. (1992). Genetic programming: on the program-
ming of computers by means of natural selection. MIT
Press, Cambridge.
ONeill, M. and Conor, R. (2003). Grammatical Evolution,
evolutionary automatic programming in an arbitrary
language. Kluwer Academic Phblishers.
Ortega, A., de la Cruz, M., and Alfonseca, M. (2007).
Christiansen grammar evolution: Grammatical evolu-
tion with semantics. IEEE Transactions on Evolution-
ary Computation, 11(1):77–90.
ICEC 2010 - International Conference on Evolutionary Computation
368