set when the last training example has been
considered in the fitness function of the algorithm
and not anytime before.
Before moving on to the invalidation mechanism
of the cache some more details regarding the
implementation of the cache hit mechanics are
necessary from the software coding point of view.
When the genetic programming algorithm starts
computing the fitness of a solution, it processes each
gene one after the other. Before executing the
corresponding computation it checks for a valid
cache entry by evaluating the cache gene validity
flag. If the flag is set then the computation is spared
and the values for all training cases regarding the
specific gene are retrieved from the cache. The
location of the cache frame is found in the cache
index table in the location that corresponds to the
index of the solution being examined. Along with
the values of the training cases the algorithm
retrieves the next gene location in the solution that
needs to be examined next. From figure 1 assuming
that node (3) fires a cache hit, the returning-gene
information will point to node (5). The evaluation
function resumes its operation from node (5) whose
result may or may not be in the cache. If cache hit
does not occur then the evaluating function performs
its computations as normally. The cache offsets
stored in the gene-index table are just accenting
integers that keep incrementing until the allocated
cache memory space is exhausted. At that point
there are three replacement strategies that can be
used: the least recently used replacement (LRU), the
FIFO replacement and the genome simplification
process. The LRU requires to keep an aging variable
in memory that is reset every time a hit occurs while
the FIFO strategy just resets the cache allocation
index to its zero relative address and start storing
cache calculations from the beginning. The third and
more appealing strategy is more appropriate for
genetic programming cache implementations since it
flushes the cache and builds up an updated one
through the application of a computational
simplification process. Complex nested genetic
programs are replaced with shorter blocks that are
mathematically equivalent. This reduces the average
length of the population and thus accelerates the
search even more. The invalidation of the cache is
discussed in more details in the following
discussion. The pseudo code that should be added to
the original fitness function of a genetic
programming algorithm is trivial and is shown in
Figure 2.
function FitnessFunction_CACHED()
for all Individuals in the population
for all TrainingCases
Offset=0;
while Offset < length(Individual)
index= AUX_CACHE_INDEXES[Individual,Offset]
if(AUX_CACHE_VALID[index]==true) // HIT
result = CACHE[index,TrainingCase]
Offset=AUX_CACHE_RETURN[Individual,Offset]
else // No Cache hit
result = . . . . . // Normal computations
Offset = . . . . . .
AUX_CACHE_INDEXES[Individual,Offset] =
CurrentIndex
CACHE[CurrentIndex,TrainingCase]=result
AUX_CACHE_RETURN[Individual,Offset]= Offset
if TrainingCase == TRAINING_CASES_SIZE
AUX_CACHE_VALID[CurrentIndex++] = true
Figure 2: The pseudo code of the modified fitness function
that incorporates the genetic programming cache.
Besides the cache hit detection and data retrieval,
in order to have a functional and sane cache
mechanism, data invalidation must be enforced in a
way that guarantees genes' value entries
synchronization. Cached gene values are valid as
long as the genes do not undergo any modification
since their initial computation. In a genetic
algorithm the genes are altered through genetic
operations like crossover and mutation. This means
that when two parents produce an offspring or an
individual gets mutated then the cached values
corresponding to the involved individuals must be
checked for validity. Some cached values must be
invalidated while others are not influenced by the
way a specific genetic operation is performed. To
detect which cache entries are invalid after a genetic
operation and which are valid, the auxiliary cache
table holding the returning nodes must be used. The
cache invalidation is explained based on a two point
crossover scheme which is slightly more complex
than one point crossover. The concept can be easily
transferred to multi point crossover operations and is
very similar to the checks performed for invalidating
the mutation operation. As mentioned before, the
invalidation rule is very simple: when a cached gene
is altered, its cached value is invalidated. The gene
alteration is detected by comparing the gene's
returning node to the crossover point and if it is
smaller, then the entry is still valid. On the contrary,
if the gene's returning node is higher than the
crossover point then the cached value must be
invalidated since there is definitely gene alteration in
the cached gene. Figure 3 shows the procedure in
more detail. Cached genes and their range (starting
and ending gene offset) are shown in the parents'
chromosomes. The crossover points define the way
ECTA2014-InternationalConferenceonEvolutionaryComputationTheoryandApplications
262