The Effect of Mutation Operation on GP- based Stream Ciphers
Design Algorithm
Wasan Shakr Awad
Department of Information Systems, College of Information Technology, University of Bahrain, Sakheer, Bahrain.
Keywords: Genetic Programming, Simulated Annealing, Stream Ciphers, Mutation Operation.
Abstract: Mutation operation is used to introduce a small perturbation in the population from time to time so as to
maintain its diversity. Several mutation operations have been developed for genetic programming. This
paper is to study the impact of mutation operation on the performance of genetic programming. We present
six types of mutation operations that have been applied in the simulated annealing programming (SAP)
algorithm, which is an algorithm used to design stream ciphers using genetic programming and simulated
annealing. Experiments performed to study the effectiveness of these operations in solving the underlying
problem. It has been shown that mutation operation can affect the performance of genetic programming,
especially when it is used to solve complex problems.
1 INTRODUCTION
Evolution is an optimization process, where the aim
is to improve the ability of a system to survive in
dynamically changing environment. There are a
number of evolutionary computation techniques,
such as genetic programming (GP). In GP, the
population individuals are computer programs
represented usually as expression trees (Koza,
1992), and a number of operations are used to
update the population individuals, such as crossover
and mutation.
In general, the inclusion of mutation step the
evolutionary algorithms is very important. The main
aim of mutation operation is to keep certain amount
of randomness and to introduce a small perturbation
in the population from time to time so as to maintain
its diversity. Most of the times the mutation
operation is applied according to some fixed
probabilistic rule. In the past few years mutation
operations based on different probability
distributions (like Normal, Gaussian, and Cauchy
etc) have become popular. Also, in our previous
work (Awad, 2011) adaptive mutation has been
applied and it has been shown its effectiveness
comparable with fixed rate mutation.
Mutation in GP is frequently treated as a
secondary operator. However, it has been shown that
mutation can significantly improve performance
when combined with crossover (Banzhaf , 1996;
Poli, 1997). Few of Koza’s (Koza, 1992; Koza,
1994) early experiments include mutation. Koza
states two reasons for the omission of mutation in
the majority of problems. First, it is rare to lose
diversity when using a sufficient population size;
therefore, mutation is simply not needed in GP.
Second, when the crossover operation occurs using
endpoints in both trees, the effect is very similar to
point mutation. Koza wished to demonstrate that
mutation was not necessary and that GP was not
performing a simple random search. This has
significantly influenced the field, and mutation is
often omitted from GP runs. While mutation is not
necessary for GP to solve many problems, it can
perform as well as crossover based GP in some
cases. O'Reily (O'Reily, 1995) argued that mutation
in combination with simulated annealing or
stochastic iterated hill climbing can perform as well
as crossover-based GP in some cases. Nowadays,
mutation is widely used in GP, especially in solving
complex problems. Koza also advises to use of a low
level of mutation (Koza, 1999). In (Luke, 1997), the
authors suggested that the situation is complex, and
that the relative performance of crossover and
mutation depends on both the problem.
Several mutation operations have been
developed for GP; some of them are listed bellow
(Koza, 1992; Koza, 1994; O'Reily, 1995;
Kinnear,
1994; Angeline
, 1996):
445
Shakr Awad W..
The Effect of Mutation Operation on GP- based Stream Ciphers Design Algorithm.
DOI: 10.5220/0004222804450450
In Proceedings of the 5th International Conference on Agents and Artificial Intelligence (ICAART-2013), pages 445-450
ISBN: 978-989-8565-39-6
Copyright
c
2013 SCITEPRESS (Science and Technology Publications, Lda.)
Subtree mutation replaces a randomly selected
subtree with another randomly created subtree.
Node replacement mutation (also known as
point mutation) is similar to bit string mutation
in that it randomly changes a point in the
individual. A node in the tree is randomly
selected and randomly changed.
Hoist mutation creates a new offspring
individual which is copy of a randomly chosen
subtree of the parent. Thus, the offspring will
be smaller than the parent and will have a
different root node.
Shrink mutation replaces a randomly chosen
subtree with a randomly created terminal.
Permutation mutation selects a random
function node in a tree and then randomly
permuting its arguments (subtrees).
In addition to these types, two new methods for
implementing the mutation operator in GP called
Semantic Aware Mutation (SAM) and Semantic
Similarity based Mutation (SSM) have been
proposed (
Nguyen, 2009).
Designing good stream cipher automatically is a
complex process, therefore GP has been used in our
previous work (Awad, 2011) for designing one class
of stream cipher systems, and we have showed that
GP can be used successfully to solve this problem.
In (Awad, 2011) GP, integrated with simulated
annealing, (SA) (
Kirkpatrik, 1983) i.e., simulated
annealing programming (SAP), has been used
successfully to design Linear Feedback Shift
Register (LFSR)-based stream cipher systems with
the desired properties, such as random keystream,
and large period length. However, we still expect
some improvements. Thus, this paper is to study the
impact of various mutation operations on the
effectiveness of SAP algorithm for designing stream
ciphers.
Six types of mutation operation are applied in this
work, which are:
GA-Based Terminal Node Mutation
Random Terminal Node Mutation
GA-Based Semantic Terminal Node Mutation
Random Sub-expression Mutation
GA-based Shrink Mutation
GA-based Semantic Shrink Mutation
2 SAP ALGORITHM OVERVEIW
SAP is a general automated design approach for
designing stream ciphers that satisfy the desired
properties. It is an integration of GP and SA in order
to work on a population of individuals and to
preserve good individuals into the next generation
(Yuichiro, 2009; Miki, 2007). The output of SAP is
a keystream generator, which is the main component
of stream cipher that generates pseudorandom
Binary sequence (keystream) of length size and
fulfills the security and efficiency requirements.
Stream ciphers (Forouzan, 2008; Rueppel, 1986;
Schneier, 1996; Golomb, 1967) are of great
importance in applications, and there are different
types of stream ciphers, only LFSR-bassed stream
ciphers are considered here, in which, the main
component of the keystream generators is LFSR,
where LFSR is a shift register with linear feedback
function.
The following is the complete SAP algorithm as
described in (Awad, 2011):
Algorithm: SAP
Input : Keystream period length
(size).
Output : LFSR-based keystream
generator.
Begin
Generate the initial population
(pop) randomly;
Evaluate pop;
Temp 250.0; //temperature
Repeat
Generate a new population (pop1) by
applying crossover and mutation;
Evaluate the fitness of the new
generated chromosomes of pop1;
Calculate the averages of fitness
values for pop and pop1, av1 and av2
respectively;
If (av2 > av1) then replace the old
population by the new one, i.e.
pop pop1;
Else
Begin
e av1-av2;
Pr e / Temp;
Generate a random number (rnd);
If (exp(-pr) > rnd) then
pop pop1;
End;
Temp Temp * 0.95;
Until (Max Number of generations);
Return the best chromosome of the
last generation;
End.
In SAP algorithm, SA is the technique used for
the construction of the keystream generators. The
structure under adaptation is the set of GP
expressions, and the GA operations are used to
update the population of expressions.
ICAART2013-InternationalConferenceonAgentsandArtificialIntelligence
446
In this algorithm, the genetic operations used are
1-point crossover with probability pc=1.0, and
terminal node mutation with probability pm=0.1.
The mutation operation used to replaces a randomly
selected one gene from a LFSR feedback function
with a new one. The function library includes the
functions: LFSR, XOR, AND, OR. The terminal
nodes are strings of characters (which are converted
to Binary digits during the process of fitness
evaluation) that represent the linear feedback
functions of LFSRs.
For example, the following population individual
(chromosome):
"& SR adfe SR ddeab" is an expression of two
shift registers combined by AND function, and
"adfe" and "ddeab" are feedback functions of the
LFSRs in the expressions.
3 MUTATION OPERATION
DESCRIPTIONS
Different types of mutation operation are applied in
SAP algorithm. The descriptions of theses
operations are given bellow.
3.1 Terminal Node Mutation
This operation randomly changes a point (terminal
node) in the individual. The point is the feedback
function of a LFSR. Three variations of this
operation are considered in this study, as follows.
I. GA-Based Terminal Node Mutation
Here, Genetic Algorithm (GA) is used to find the
best LFSR feedback function to replace a randomly
selected terminal node (LFSR feedback function) of
an individual. The description of this operation is as
follows.
The population chromosomes in GA are
represented as fixed length Binary strings. The
lengths of these strings are equal to the length of the
randomly selected terminal node S of SAP
individual, where S is the Binary feedback function
of a LFSR. Each chromosome in GA population is a
mask used to modify S, by Xoring GA chromosome
with S. The probability of generating the gene "1" in
the GA initial population is 0.2, while the
probability of the gene "0" is 0.8.
The fitness value is a measurement of the
goodness of the keystream generator, and it is used
to control the application of the operations that
modify a population. The fitness function used in
SAP is also used in GA. After modifying S, the SAP
program is executed to generate the keystream. The
generated keydtream is then evaluated as described
in (Awad, 2011). Eqs. (1), (2), and (3) are used to
evaluate the GA chromosomes. Eq. (1) is used for
the evaluation of keystream randomness using the
frequency and serial tests, in which, nw is the
frequency of w (where w = 00, 01, 10, or 11) in the
generated binary sequence.
4
1
10
size
nwnnf
(1)
There is another randomness requirement which
is: (1/2
i
* n
r
) of the runs in the sequence are of
length i, where n
r
is the number of runs in the
sequence. Thus, we have eq. (2).
M
i
ir
i
nn
f
1
2
1
2
(2)
where M is maximum run length, and n
i
is the
desired number of runs of length i. Thus, the fitness
function used to evaluate the chromosome x will be
as given by eq. (3), where wt is a constant and size is
the keystream period length:
)(211
)(
xlength
wt
ff
size
xfit
(3)
The parameters used in this work were set based
on the experimental results, the parameter value that
show the highest performance was chosen to be used
in the implementation of the algorithm. Thus, the
genetic operations used to update the population are
1-point crossover with probability pc=1.0. The
selection strategy, used to select chromosomes for
the genetic operations, is the 2- tournament
selection. The old population is completely replaced
by the new population which is generated from the
old population by applying the genetic operations.
The run of GA is stopped after a fixed number of
generations. The solution is the best chromosome of
the last generation.
II. Random Terminal Node Mutation
Using this operation, S is replaced by randomly
generated LFSR feed back function.
III. GA-Based Semantic Terminal Node Mutation
GA is used to find the best LFSR feed back function,
in term of the big differences in the generated
keystreams, to replace a randomly selected terminal
node (LFSR feed back function) of an individual.
The description of this operation is as follows.
The population chromosomes GA are
represented as fixed length Binary strings. The
TheEffectofMutationOperationonGP-basedStreamCiphersDesignAlgorithm
447
lengths of these strings are equal to the length of the
randomly selected terminal node S of SAP
individual. Each chromosome in GA population
represents the candidate LFSR feedback function to
replace S.
The chromosomes of GA population are
evaluated based on the big difference in the
generated keystreams generated by S and GA
individuals. Thus, eq. (4) used to compute the fitness
value of the GA chromosome x.
size
i
ii
keystreamkeystreamxfit
1
'
)(
(4)
Where keystream
i
is the ith bit of the keystream
generated by S, and keystream
i
'
is the ith bit of the
keystream generated by the LFSR with the GA
chromosome x as feedback function.
The genetic operations used to update the
population are 1-point crossover with probability
pc=1.0. The selection strategy, used to select
chromosomes for the genetic operations, is the 2-
tournament selection. The old population is
completely replaced by the new population which is
generated from the old population by applying the
genetic operations. The run of GA is stopped after a
fixed number of generations.
3.2 Sub-expression Mutation
By applying this operation, a sub-tree is selected
randomly from SAP expression, and then is replaced
by a new sub-expression. In this work, different sub-
expression mutations have been studied. The
following is the description of each one.
I. Random Sub-expression Mutation
Using this operation the sub-expression of SAP
expression is replaced by randomly generated
expression.
II. GA-based Shrink Mutation
This operation is used to reduce the size of SAP
expression, by finding a LFSR that is equivalent to a
randomly selected sub-expression. The root node of
the sub-expression should be any function other than
LFSR. GA is used to find an equivalent LFSR that
generates the same keystream generated by the
selected sub-expression.
The population chromosomes in GA are
represented as variable length Binary strings. Each
chromosome in GA population represents the
candidate LFSR feedback function.
The solution of GA is an equivalent LFSR that
generates the same keystream generated by the sub-
expression selected randomly from a SAP
expression. Thus, Eq. (6) is used to compute the
fitness value of the GA chromosome x.
size
i
ii
keystreamkeystreamf
1
'
3
(5)
31
)(
f
size
xfit
(6)
Where keystream
i
is the ith bit of the keystream
generated by the sub-expression, and keystream
i
'
is
the ith bit of the keystream generated by the LFSR
with the GA chromosome x as feedback function.
The genetic operations used to update the
population are 1-point crossover with probability
pc=1.0. The selection strategy, used to select
chromosomes for the genetic operations, is the 2-
tournament selection. The old population is
completely replaced by the new population which is
generated from the old population by applying the
genetic operations. The run of GA is stopped after a
fixed number of generations. The solution is the best
chromosome of the last generation.
III. GA-based Semantic Shrink Mutation
It is the similar to previous operation, but GA is used
here to find an equivalent LFSR that generated
different keystream, thus the fitness function used is
given by eq. (4).
4 RESULTS
This section presents the findings and results of the
experiments carried out to demonstrate the impact of
different types of mutation operations on the
effectiveness of SAP algorithm. Therefore, SAP has
been implemented with six types of mutation
operations mentioned above. The six algorithms
have been applied with different mutation rates. The
best algorithms parameters used in the experiments
are:
Max number of generations of SAP = 30
Max number of generations of GA = 10
Pop size of SAP = 100
Pop size of GA (if used) = 10
The results of applying the six algorithms are
presented in table 1. These results are obtained by
running each algorithm 100 times for different
values of mutation rates. The results of table 1
represent the average of the fitness values of the best
chromosomes in 100 runs. According to the results
and by applying Wilcoxon signed-rank test,
GA-
Based Terminal Node Mutation has been greatly
ICAART2013-InternationalConferenceonAgentsandArtificialIntelligence
448
Table 1: Fitness values averages of 100 runs.
Mutation
Rate%
GA-Based
Terminal
Node
Mutation
Random
Terminal
Node
Mutation
Random
Sub-
expression
Mutation
GA-Based
Semantic
Terminal
Node
Mutation
GA-based
Shrink
Mutation
GA-based
Semantic
Shrink
Mutation
0 31.6738
5 37.3666 30.4139 31.1553 29.4109 24.4812 29.2382
10 43.7615 31.3691 31.4582 33.307 27.5608 29.8604
15 44.2341 31.3691 33.2401 35.4674 26.6744 26.5393
20 44.4663 33.5122 34.5481 35.4996 23.0873 25.3877
30 45.1159 32.8931 35.4401 32.9202 24.6556 23.365
40 47.338 34. 264 35.9812 37.2787 30.7147 25.3943
50 47.452 33.8621 35.5944 38.1934 30.6348 25.9628
60 49.1361 35.3515 36.1849 37.4782 31.3748 26.4893
improved the performance of SAP algorithm; it can
evolve keystream generators that generate
keystreams of good statistical properties and of large
period length. Also, increasing the mutation rate
improves the results. That is because; the feedback
function of LFSR has great effect on the output the
keystream generators. Thus, by using GA to find the
best feedback function can highly improve the
results.
5 CONCLUSIONS
In this paper, different types of mutation operations
have been applied in SAP algorithm in order to
study the impact of mutation operation on the
algorithm performance. It has been shown that
mutation operation can affect the performance of
GP, especially when it is used to solve complex
problems, such as designing stream ciphers. Six
mutation operations have been applied. We found
that GA-Based Terminal Node Mutation that
replaces the feedback function of LFSR (terminal
node) by a new feedback function evolved by GA
can highly improve SAP algorithm.
REFERENCES
Angeline, p. j.; 1996. An investigation into the sensitivity of
genetic programming to the frequency of leaf selection
during subtree crossover. In Proceedings of the First
Annual Conference on Genetic Programming. Stanford
University, CA, USA. PP 21-29.
Awad, W. S.; 2011. Designing stream cipher systems
using GP. LNCS, Vol. 6683, PP. 308-320.
Banzhaf, W., Francone, F. D., and Nordin, N.; 1996. The
effect of extensive use of the mutation operator on
generalization in genetic programming using sparse
data sets. LNCS, Vol. 1141, PP. 300–309.
Forouzan, B. A.; 2008. Cryptography and network
security. McGRAW-HILL, New York, USA.
Golomb, S. W.; 1967. Shift Register Sequence. Holden-
Day, San Francisco, USA.
Kinnear, K. E.; 1994. Fitness landscapes and difficulty in
genetic programming. In Proceedings of the IEEE
World Conference on Computational Intelligence.
Vol. 1, PP. 142-147.
Kirkpatrik, S., et al.; 1983. Optimization by simulated
annealing. Science, 220(4598), 671-680.
Koza, J. R.; 1992. Genetic Programming: On the
Programming of Computers by Means of Natural
Selection. MIT Press.
Koza, J. R.; 1994. Genetic Programming II: Automatic
Discovery of Reusable Programs. MIT Press.
Koza, J. R., F. H. B., Andre, D., and Keane, M. A.; 1999.
Genetic Programming III: Darwinian Invention and
Problem Solving. Morgan Kaufmann Press.
Luke, S., and Spector, L.; 1997. A comparison of
crossover and mutation in genetic programming. In
Proceedings of the Second Annual Conference on
Genetic Programming. Morgan Kaufmann, PP. 240–
248.
Miki, M., Hashimoto, M., and Fujita, Y.; 2007. Program
Search with Simulated Annealing. In Proceeding of
the 9th Annual Conference on Genetic and
Evolutionary Computation. PP. 1754 – 1754.
Nguyen Quang Uy, Nguyen Xuan Hoai, and Michael
O’Neill; 2009. Semantics based mutation in genetic
programming: the case for real-valued symbolic
regression. In Mendel09, 15th International
Conference on Soft Computing. PP. 73-91.
O'Reilly, U.; 1995. An Analysis of Genetic Programming.
PhD thesis, Carleton University, Ottawa-Carleton
Institute for Computer Science, Ottawa, Ontario,
Canada.
TheEffectofMutationOperationonGP-basedStreamCiphersDesignAlgorithm
449
Poli, R., and Langdon, W. B.; 1997. Genetic programming
with one-point crossover. In Soft omputing in
Engineering Design and Manufacturing. Springer-
Verlag, PP. 180–189.
Rueppel, R. A.; 1986. Analysis and Design of Stream
Cipher. Springer-Verlag.
Schneier, B.; 1996. Applied cryptography. John Wiley and
Sons.
Yuichiro, U., Mitsunori, and M., Tomoyuki H.; 2009.
Simulated Annealing Programming Using Effective
Subtrees. Doshisha Daigaku Rikogaku Kenkyu
Hokoku, 49(4), 205-209.
ICAART2013-InternationalConferenceonAgentsandArtificialIntelligence
450