operational definition of the mutation operator of the
genetic algorithm as the act of randomly changing
the values of some alleles of a simulated
chromosome. As it is technically possible, though
highly unlikely, for every allele of a simulated
chromosome to be mutated in a single generation of
a typical genetic algorithm, it follows that it is then
also possible for any chromosome to be entirely
transformed into any other chromosome in a single
generation. However, as the mutation operator is
typically applied to each allele probabilistically and
independently, the likelihood of one chromosome
transforming into another decreases exponentially
with the number of alleles that differ between the
chromosomes in question. Consequently, the widely
known Hamming distance metric often used in
quantifying the distance between two strings is
frequently employed by researchers of the genetic
algorithm as a measurement of the distance between
the chromosomes in the population simulation.
Although the widespread use of the Hamming
distance (Jones, 1995a) as a measure of the distance
between simulated chromosomes is not
inappropriate, it is important to acknowledge that
Hamming distance alone is only representative of
the developments facilitated by a mutation operator
and, thus, should only be considered the sole source
of variation in a population that employs asexual
reproduction alone. With sexual reproduction
becoming the predominant form of reproduction for
the majority of the non-microscopic organisms
observed in the natural world (Merrell, 1994), the
genetic algorithm, seeking to emulate populations
observed in the natural world as closely as possible,
typically also employs a binary recombination
operator that is often referred to as the crossover
operator.
This recombination operator used by the genetic
algorithm can be defined simply as an operator that
exchanges data between the encoded chromosomes
of two population members, in emulation of the
biological process (Mitchell, 1998). Typically the
operator randomly selects a set of alleles from one
chromosome to be exchanged with the
corresponding alleles of another. A uniform
recombination operation will exchange each allele of
a simulated chromosomes probabilistically and
independently which, although similar to the manner
in which the typical mutation operator is applied,
entails that the range of possible offspring that can
be created through recombination is directly
proportional to the genetic difference between the
simulated chromosomes selected to act as parents.
Although k-point recombination operations,
which randomly select a set number of substrings
from a simulated chromosome for exchange, are also
employed by genetic algorithm researchers with
great frequency, since the set of possible offspring
that can be created through the application of a
uniform recombination operation contains all sets of
possible offspring that can be created through the
application of any number of fixed k-point
recombination operations, for the sake of
generalizability all subsequent references to
recombination refer to uniform recombination.
3 DISTANCE MEASUREMENT
The significance of distance functions to the genetic
algorithm is most apparent when considering a
formal definition of the fitness landscape (Stadler,
2002) that the genetic algorithm traverses in search
of an optimum. Stadler defined the three-part
composition of a fitness landscape to include an
evaluation function to be optimized, the set of
possible candidate solutions, that are represented by
the genetic algorithm as simulated chromosomes,
and a conceptualization of distance or
neighbourhood that induces a topology on the
solution set to create a solution space. Furthermore,
knowing the distance between two chromosomes
that must be traversed by the operators of the genetic
algorithm is a reasonable indicator of the smallest
number of generations it will take before the
transformation of one chromosome to another is
possible. Although the application of this
information to optimization is apparent, by
computing the distance between all possible pairs of
chromosomes in the population, it is possible to get
an impression of the actual diversity of the
population as well.
For a function whose domain is a pair of
simulated chromosomes and whose range is a real
value to be considered a true measure of distance
(or, equivalently, a metric), there are four conditions
that must be satisfied. Firstly, the function must
never report the distance between two elements of
the solution set as a negative value, a condition
known as non-negativity. The function must also
comply with the identity of indiscernibles condition
that states that the distance between two elements
can and will only be considered zero if the two
elements are identical. The third condition that must
be observed, symmetry, states that the distance to be
traversed from element x to element y must be the
same as the distance that would be traversed from
ICEC 2010 - International Conference on Evolutionary Computation
86