Average Hamming Distance(AHD). Average ham-
ming distance for all pairs of generated levels.
Duplication Rate(DR). Percentage of generated lev-
els that are not unique.
Special Tile Matches(STM). For all pairs of gener-
ated levels, the percentage of pairs for which the
position of the only existing tile in Zelda, such as
key, door, or player, matches.
Playability evaluation is necessary because the
level generation model should generate playable lev-
els. In addition, several indices were used to quan-
titatively measure the diversity of the generated lev-
els. We considered that the greater the percentage of
unique levels a generator can generate and the greater
the hamming distance between generated levels, the
more diverse the levels it can generate, so we used in-
dices such as duplication rate and average hamming
distance for quantitative evaluation. Furthermore, in
Zelda, there are only one each of the squares for the
key, goal, and player, and we considered these objects
to be important in the game. To evaluate whether we
were able to create different levels as a game experi-
ence, we used the percentage of agreement in the po-
sitions of these objects as an evaluation index. Playa-
bility and duplication rate was calculated from 10000
levels, while average hamming distance and special
tile matches were calculated from 1000 levels.
4.2.2 Results and Discussion
The results of the evaluation for each of the learned
models are shown in Table 1 and 2. Examples of
the levels generated by each method are shown in
Figures 4,5,8 and 9. As a comparison method, we
use the conventional method using only bootstrap in
(Rodriguez Torrado et al., 2020) and the GAN train-
ing method without data augmentation and diversity
Loss.
Comparing the proposed and conventional meth-
ods, the proposed methods outperform the conven-
tional bootstrap method and the simple GAN method
in terms of diversity indices such as average hamming
distance, duplication rate, Special tile matches, and
variance of each tile number, indicating that the pro-
posed method is able to generate various levels. In
the conventional bootstrap method, the model used in
this experiment generated only levels that were close
to specific levels in the early stages of training, so the
dataset was biased as bootstrap continued to add only
similar levels as training data, resulting in small dif-
ferences between levels and a high duplication rate.
A simple GAN could only generate levels that were
nearly identical to the training dataset.
Comparing the proposed data augmentation meth-
ods, Method 1 and Method 2, the hamming distance
between the generated data increased for Method 1,
while the average number of tiles in the generated lev-
els was significantly different from the original data.
This may be due to the fact that the distribution of
the level in the dataset gradually became more bi-
ased as generated levels was added to the dataset. In
Method 2, the bias of the tile distribution and the value
of special tile matches are smaller because the data
was augmented so that level bias within the dataset
is less likely to occur. On the other hand, playability
is reduced. Most of the cases where a level cannot
be generated correctly are those where the constraints
on the number of keys, doors, and player tiles are not
satisfied. The baseline method and the simple GAN
method, where the position of these tiles is almost de-
terministic, show high playability, while the proposed
method, where the positions of these tiles are diversi-
fied, shows low playability. As for the low playability,
it could be improved by devising a model structure
like CESAGAN, but since it is possible to generate a
large number of levels in parallel, playable levels can
be selected from among the many levels generated. It
is also possible to generate playable levels with suf-
ficient probability by searching for latent variables as
described in the next subsection.
4.3 The Effectiveness of Evolutionary
Latent Space Search
Since the proposed method has acquired a genera-
tor capable of various outputs, it can effectively uti-
lize the method of generating levels following the ob-
jective by searching latent variables. In a previous
work (Volz et al., 2018), it was shown that level gen-
eration reflecting the creator’s objective is possible
by optimizing the input latent variables according to
the objective function using CMA-ES. CMA-ES is a
black-box optimization algorithm based on evolution-
ary computation and performs continuous optimiza-
tion by evolutionary computation using a multi-point
search with a Gaussian distribution. By optimizing
the latent vectors of the generator’s inputs with CMA-
ES, latent variables that generate levels in line with
the objective function can be discovered.
In this part, we optimize latent variables for the
following two objective functions F
1
and F
2
to exam-
ine the extent to which latent variables reflecting the
objective functions can be generated. Note that P is
a variable that takes 1 if the level is playable and 0
otherwise.
ICAART 2023 - 15th International Conference on Agents and Artificial Intelligence
330