clusters generated by the SOM. And lastly, the combi-
nation of the Frontier method with SOM yields a more
subtle behaviour in terms of clustering. This trans-
lates to a better understanding of the environment and
an accelerated process of finalizing the game.
Future work could employ graph representation to
uncover diverse agent behaviours by extracting more
graph data, such as cycle counts or frequently visited
paths, for better competence understanding. It can
also be applied to hierarchical RL to monitor skills
and behaviours across different levels.
REFERENCES
Aubret, A., Matignon, L., and Hassas, S. (2021). Elsim:
End-to-end learning of reusable skills through intrin-
sic motivation. In Machine Learning and Knowl-
edge Discovery in Databases: European Confer-
ence, ECML PKDD 2020, Ghent, Belgium, September
14–18, 2020, Proceedings, Part II, pages 541–556.
Springer.
Baraldi, A., Alpaydm, E., and Simplified, A. (1998). a new
class of art algorithms. International Computer Sci-
ence Institute, Berkley, CA.
Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T.,
Saxton, D., and Munos, R. (2016). Unifying count-
based exploration and intrinsic motivation. Advances
in neural information processing systems, 29.
Buabin, E. (2013). Understanding reinforcement learning
theory for operations research and management. In
Graph Theory for Operations Research and Manage-
ment: Applications in Industrial Engineering, pages
295–312. IGI Global.
Clements, W. R., Van Delft, B., Robaglia, B.-M., Slaoui,
R. B., and Toth, S. (2019). Estimating risk and uncer-
tainty in deep reinforcement learning. arXiv preprint
arXiv:1905.09638.
Din, F. S. and Caleo, J. (2000). Playing computer games
versus better learning.
Eysenbach, B., Salakhutdinov, R. R., and Levine, S. (2019).
Search on the replay buffer: Bridging planning and
reinforcement learning. Advances in Neural Informa-
tion Processing Systems, 32.
Fritzke, B. (1995). A growing neural gas network learns
topologies, vol. 7.
Grossberg, S. (1976). Adaptive pattern classification and
universal recoding: I. parallel development and cod-
ing of neural feature detectors. Biological cybernetics,
23(3):121–134.
Huang, Z., Liu, F., and Su, H. (2019). Mapping state space
using landmarks for universal goal reaching. Ad-
vances in Neural Information Processing Systems, 32.
Jin, J., Zhou, S., Zhang, W., He, T., Yu, Y., and Fakoor, R.
(2021). Graph-enhanced exploration for goal-oriented
reinforcement learning. In International Conference
on Learning Representations.
Kohonen, T. (2012). Self-organization and associative
memory, volume 8. Springer Science & Business Me-
dia.
Marsland, S., Nehmzow, U., and Shapiro, J. (2005). On-
line novelty detection for autonomous mobile robots.
Robotics and Autonomous Systems, 51(2-3):191–206.
Merrick, K. E. and Maher, M. L. (2009). Motivated rein-
forcement learning: curious characters for multiuser
games. Springer Science & Business Media.
Pathak, D., Agrawal, P., Efros, A. A., and Darrell, T. (2017).
Curiosity-driven exploration by self-supervised pre-
diction. In International conference on machine learn-
ing, pages 2778–2787. PMLR.
Peng, X. B., Chang, M., Zhang, G., Abbeel, P., and Levine,
S. (2019). Mcp: Learning composable hierarchi-
cal control with multiplicative compositional policies.
arXiv preprint arXiv:1905.09808.
Skupin, A., Biberstine, J. R., and B
¨
orner, K. (2013). Visu-
alizing the topical structure of the medical sciences: a
self-organizing map approach. PloS one, 8(3):e58779.
Sutton, R. S. and Barto, A. G. (1998). Reinforcement
learning: an introduction mit press. Cambridge, MA,
22447.
Tu, L. A. (2019). Improving feature map quality of som
based on adjusting the neighborhood function. In Al-
musaed, A., Almssad, A., and Hong, L. T., editors,
Sustainability in Urban Planning and Design, chap-
ter 5. IntechOpen, Rijeka.
Using Abstraction Graphs to Promote Exploration in Curiosity-Inspired Intrinsic Motivation
513