generate the dataset with altered parameters such
as new labels or new objects.
• Automatic labelling leads to perfect consistency
in the labelling process; human error is contained
within the parameter setting process.
• One human expert may have control over the
dataset’s parameters, rather than having a hu-
man expert training a number of less experienced
workers.
• The automatic labelling approach allows the com-
bination of assets which otherwise would not be
seen together, potentially leading to a dataset that
could create a more general model.
From the experiment on the autoencoder, with the
specified latent space size / segmentation resolution
of 15x15, the automatic segmenting model outper-
formed the autoencoder at maintaining key features.
From that result, it is assumed that being able to spec-
ify the segmentation resolution of the dataset is a use-
ful tool in creating a model while seeking to optimize
the amount of space used to summarize key features.
It may be possible to generate a dataset by treat-
ing all game objects as dynamic with the proposed
algorithm, or in other words, placing all game assets
randomly within a screen with no adherence to game
rules. Such an approach was not tested for this pa-
per, as it was assumed that a more realistic dataset
should be used to create more accurate segmentation
models. However, this may be worthy of additional
experimentation.
One of the areas deemed most important in further
evaluating the overarching segmentation approach is
to create deep reinforcement learning agents that use
a segmenting encoding as state input powered by a
model trained on a dataset created with the proposed
methods. This may reveal whether a segmented state
input is a useful component in creating agents capa-
ble of exceeding human performance across a broader
state space, perhaps one that even spans multiple en-
vironments. The utilization of a low segmentation
resolution in the dataset could increase the number
of samples that could be stored in an experience re-
play mechanism, as well as the number of samples
that could be used in a batch in an environment.
Another potential future work of interest is apply-
ing a similar algorithm to a 3D environment. In par-
ticular, automatically creating a traffic dataset for the
purposes of training an autonomous driving agent.
In conclusion, the automatic labelling approach is
an effective way of lowering the time cost of dataset
generation over manual methods. Generating data in
this way enables rapid experimentation with image
segmentation parameters, and as such it should be
used to determine the effectiveness of segmentation
as input to deep reinforcement learning agents.
REFERENCES
Atari (1972). Pong. Atari 2600.
Badia, A. P., Piot, B., Kapturowski, S., Sprechmann, P.,
Vitvitskyi, A., Guo, D., and Blundell, C. (2020).
Agent57: Outperforming the atari human bench-
mark. number: arXiv:2003.13350 arXiv:2003.13350
[cs, stat].
Baker, B., Kanitscheider, I., Markov, T., Wu, Y., Powell,
G., McGrew, B., and Mordatch, I. (2020). Emer-
gent tool use from multi-agent autocurricula. arXiv:
1909.07528.
Berner, C., Brockman, G., Chan, B., Cheung, V., Denni-
son, C., Farhi, D., Fischer, Q., Hashme, S., Hesse,
C., J
´
ozefowicz, R., Gray, S., Olsson, C., Pachocki,
J., Petrov, M., Salimans, T., Schlatter, J., Schneider,
J., Sidor, S., Sutskever, I., Tang, J., Wolski, F., and
Zhang, S. (2019). Dota 2 with large scale deep rein-
forcement learning.
Bojja, A. K., Mueller, F., Malireddi, S. R., Oberweger, M.,
Lepetit, V., Theobalt, C., Yi, K. M., and Tagliasac-
chi, A. (2018). Handseg: An automatically labeled
dataset for hand segmentation from depth images.
arXiv:1711.05944 [cs].
Capcom (1987). Mega man. Nintendo Entertainment Sys-
tem.
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler,
M., Benenson, R., Franke, U., Roth, S., and Schiele,
B. (2016). The cityscapes dataset for semantic urban
scene understanding.
Hudson Soft (1986). Adventure island. Nintendo Entertain-
ment System.
Konami (1986). Castlevania. Nintendo Entertainment Sys-
tem.
Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A.,
and Talwalkar, A. (2018). Hyperband: A novel
bandit-based approach to hyperparameter optimiza-
tion. arXiv:1603.06560 [cs, stat].
Lundh, F. (1999). tkinter.
Nichol, A., Pfau, V., Hesse, C., Klimov, O., and Schulman,
J. (2018). Gotta learn fast: A new benchmark for gen-
eralization in rl. arXiv preprint arXiv:1804.03720.
Nintendo (1985). Super mario bros. Nintendo Entertain-
ment System.
Papadeas, I., Tsochatzidis, L., Amanatiadis, A., and
Pratikakis, I. (2021). Real-time semantic image seg-
mentation with deep learning for autonomous driving:
A survey. Applied Sciences, 11(19):8802.
Schaul, T., Quan, J., Antonoglou, I., and Silver, D. (2016).
Prioritized experience replay. arXiv:1511.05952 [cs].
ICAART 2023 - 15th International Conference on Agents and Artificial Intelligence
516