Molecule Builder: Environment for Testing Reinforcement Learning

Agents

Petr Hyner

1,2

, Jan H

ula

2,3

and Mikol

s Janota

Department of Informatics and Computers, Faculty of Science, University of Ostrava,

Ostrava, Czech Republic

Institute for Research and Applications of Fuzzy Modeling, University of Ostrava,

Ostrava, Czech Republic

Czech Technical University in Prague, Prague, Czech Republic

Keywords:

Reinforcement Learning, Subgoals, Environment, Agent.

Abstract:

We present a reinforcement learning environment designed to test agents’ ability to solve problems that can

be naturally decomposed using subgoals. This environment is built on top of the PyVGDL game engine and

enables to generate problem instances by specifying the dependency structure of subgoals. Its purpose is to

enable faster development of Reinforcement Learning algorithms that solve problems by proposing subgoals

and then reaching these subgoals.

1 INTRODUCTION

This contribution describes a minimalistic environ-

ment called Molecule Builder whose purpose is to test

the abilities of Reinforcement Learning (RL) agents.

Concretely, this environment tests the agents’ ability

to solve problems that require the completion of many

subgoals before reaching the ﬁnal state. Reinforce-

ment learning methods can solve problems of surpris-

ing difﬁculty (Silver et al., 2017; Mirhoseini et al.,

2021). Nevertheless, they are not very effective in

terms of sample complexity. They require an exten-

sive computational budget for trial and error explo-

ration to discover an effective behaviour for a given

problem. If the goal state (or a state with a reward)

ﬁrst requires the completion of several subgoals that

depend on each other, it is possible that the agent will

never reach such a state and, therefore, will not obtain

any learning signal.

This attribute is present in many real-life scenar-

ios, which should justify the search for novel ap-

proaches that relate to sequential decision-making

problems. It is a mark of human cognition that we cre-

ate hierarchical plans when we try to solve a complex

and novel task. Additionally, factorization of complex

structures into more simple structures is an essential

tool in problem solving.

We believe that RL agents should also be equipped

with this ability. To develop such agents/algorithms,

it is necessary to have a benchmark environment that

naturally contains the concept of subgoals ﬁrst. Our

motivation for the work presented here was to de-

velop such a benchmark that would provide challeng-

ing problems for the current generation of RL algo-

rithms. Such a benchmark should be as minimalistic

as possible, while the environment should be easily

conﬁgurable by anyone who considers implementing

an agent to solve it.

The environment presented consists of a random

generator of problem instances in which the goal of

the agent is to build a structure from simpler build-

ing blocks. The structure could contain substructures,

and the construction of these substructures represents

a natural subgoal. Moreover, the building blocks may

be blocked by various obstacles that may be destroyed

only by the corresponding structures. Therefore, each

problem instance could be created from a dependency

graph that reﬂects the order in which the individual

structures could be built.

The text is structured as follows. Section 2 de-

scribes fundamental concepts in reinforcement learn-

ing and provides motivation for developing agents

that explore the state-space by proposing subgoals to

themselves. Section 3 provides a high-level overview

of the Molecule Builder environment. Section 4

describes the generative model for the problem in-

stances, and the last two Sections 5 and 6 are devoted

to related work and conclusion.

450

Hyner, P., H ˚ula, J. and Janota, M.

Molecule Builder: Environment for Testing Reinforcement Learning Agents.

DOI: 10.5220/0012257900003595

In Proceedings of the 15th International Joint Conference on Computational Intelligence (IJCCI 2023), pages 450-458

ISBN: 978-989-758-674-3; ISSN: 2184-3236

2 SUBGOALS IN

REINFORCEMENT LEARNING

The ﬁeld of Reinforcement Learning is focused on

methods for sequential decision-making problems.

These problems are formalized by a Markov decision

process (MDP), which is a 4-tuple (S, A, P

, R

• S is a set of states called the state space

• A is a set of actions called the action-space

• P

(s, s

′

) = Pr(s

t+1

= s

′

| s

= s, a

= a) is the prob-

ability that action a in state s at time t will lead to

state s

′

at time t + 1

• R

(s, s

′

) is the immediate reward (or expected im-

mediate reward) received after transitioning from

state s to state s

′

, due to action a

Many problems can be formulated as MDPs. For

example, many examples can be found in combina-

torial optimization, with the canonical example be-

ing the Travelling Salesman Problem (TSP) (Gavish

and Graves, 1978). In the TSP, we have a directed

(weighted) graph, and the goal is to ﬁnd the shortest

cycle, which visits all the vertices. Here the state s ∈ S

can be represented as a set of already visited vertices,

a set of still non-visited vertices, and the vertex the

agent is currently in. The set of actions A available in

a given state corresponds to the choice over the set of

non-visited vertices. The transitions are deterministic

and, therefore, P

(s, s

′

) = 1 only for one (consistent)

triple (a, s, s

′

) and 0 otherwise. The rewards could be,

for example, set in such a way that the agent receives

a non-zero reward only for the action which closes the

cycle. The value of the reward would be equal to the

length of the cycle.

To solve a given MDP means to ﬁnd a policy func-

tion π : S → A (which can be potentially probabilistic)

that maximizes the expected cumulative reward:

∞

∑

t=0

, s

t+1

)

. (1)

In this expression, E is the expected value, a

the action sampled according to the policy π and s

t+1

is the state sampled according to Pr(s

t+1

| s

, a

). The

expected value E is over the randomness in these two

variables.

In practice, we do not necessarily need to ﬁnd the

optimal policy π

∗

, which maximizes this expression.

It is often enough to ﬁnd any policy that is ”good

enough.”

Various methods exist to ﬁnd such policy (Mnih

et al., 2013; Sutton et al., 1999a) but the general idea

is that the agent usually starts to explore the state

space by choosing actions according to a random pol-

icy, and this policy is repeatedly updated according to

rewards that the agent observes. If the agent does not

observe any reward during its exploration, it cannot

learn anything. This problem of sparse rewards is of-

ten approached by reward shaping, where additional

(intermediate) rewards are given to the agent to guide

it towards a more desirable behaviour. Creating these

additional rewards often requires domain expertise.

The policy function can take various forms. In the

simplest case, when the state space is small, it can

be a lookup table. It can also be determined by an-

other function called value function, which, in simple

words, measures how valuable it is for the agent to

be in a given state. When the agent has access to this

value function and the (probabilistic) transition func-

tion, it can choose the action which leads to a state

with the highest (expected) value.

Very often, the policy/value function is repre-

sented by a neural network which is learned dur-

ing the exploration process. Sometimes, the agent

also needs to learn the transition function, sometimes

called a world model, because it may not be known

(i.e. in robotics.)

When the agent makes decisions according to the

learned value function, which may be imprecise, the

quality of each decision is often considerably im-

proved if the agent ﬁrst simulates many possible fu-

ture trajectories (sequences of actions and states) and

observes which states they lead to before it decides

how to act. This is known under the term model pre-

dictive control (MPC) and it vaguely resembles cer-

tain cognitive processes of humans and other animals,

which can run mental simulations before they decide

which actions to take.

Nevertheless, the crucial difference between MPC

and human planning abilities is that our mental simu-

lations do not necessarily follow the low-level dynam-

ics of the environment. For example, when we want to

safely travel to a different country, we do not imagine

what our body will be doing at every moment of the

journey. Instead, we plan over high-level actions and

states such as ”buy a ticket” or ”get to the airport”.

We believe that this ability to freely move in space

and time in our minds is crucial for our effectiveness

in problem-solving and that ﬁguring out how to en-

rich RL agents with this ability will allow us to solve

much more complicated problems than the ones we

are currently able to solve. This motivated us to cre-

ate a minimalistic environment for testing RL agents,

which contain a clear notion of subgoals and where

the dependency graph of subgoals is easily control-

lable.

Molecule Builder: Environment for Testing Reinforcement Learning Agents

451

3 THE MOLECULE BUILDER

ENVIRONMENT

We have built the Molecule builder environment on

top of the Video Game Description Language (con-

cretely, the Python implementation PyVGDL (Schaul,

2013; Vereecken, 2018), which was designed to al-

low for quick prototyping of test environments for AI

agents.

As the name suggests, the basic principle of

this game/environment is to build ”molecules” from

”atoms.” The agent is placed in a simple grid world

that contains rooms and corridors, and the atoms are

scattered in the rooms. In each game, the goal is to as-

semble a particular molecule that may consist of sub-

molecules. Moreover, some corridors may be blocked

by obstacles that may prevent the agent from access-

ing the required atoms needed to construct the tar-

get molecule. The given obstacle can be destroyed

by a different molecule, and the construction of this

molecule naturally represents a subgoal in this game.

There may be several such subgoals that may de-

pend on each other and the dependency graph of these

molecules constitutes a high-level structure of each

game instance. The action space A consists of four

actions: move up, move down, move left and move

right.

Our framework enables the generation of games

of desired complexity by specifying this dependency

graph. This control over the complexity of the prob-

lem allows us to create curricula of problems in which

the agent can ﬁrst learn to assemble simple molecules

and then proceed to more complex games in which

the assembly of these molecules represents a subgoal.

Each game consists of a layout that represents the ini-

tial state of the game (the position of walls, atoms, ob-

stacles, and the agent). It also consists of a deﬁnition

ﬁle that describes the default behaviour of individ-

ual entities in the game, their interactions, and their

appearance (each entity is represented by an ASCII

symbol). The main part in the deﬁnition ﬁle is the in-

teraction set, which dictates how the molecules could

be built from atoms and sub-molecules.

3.1 The Layout of the Game Instance

Figure 1 shows a randomly sampled layout of a game

with many atoms scattered throughout the rooms.

Each symbol represents a given entity deﬁned inside

the SpriteSet block and mapped in the LevelMapping

block in the deﬁnition ﬁle shown in Figure 2. In this

layout, the dots represent a ﬂoor where the agent can

move freely. The letter w represents a wall through

which the agent cannot move. Other symbols, such

Figure 1: An example of a simpliﬁed layout.

as 3, 5, /, ) etc., are the atom symbols. Several

other types of symbols (C, o) correspond to obsta-

cles. These symbols are placed inside corridors that

connect two rooms and the agent cannot pass through

them. Finally, the symbol A corresponds to the agent.

Here, we describe how such a game instance can

be solved. The goal is to build a target molecule con-

sisting of atoms ), (, /, 5, 3. As can be seen in Figure 1,

there are obstacles that prevent the agent from reach-

ing some parts of the environment. In detail, there are

three separate parts of the environment that are inac-

cessible without the removal of obstacles. Part 1, is

the part where the agent starts and contains atoms ),

(, /, 5, !. Part 2 contains symbols ), (, / and could be

accessed from part 1 after removing the obstacle C.

Finally, part 3 contains atom 3 and can be accessed

from part 1 after removing the obstacle o.

NCTA 2023 - 15th International Conference on Neural Computation Theory and Applications

452

Figure 2: An example of a simpliﬁed deﬁnition ﬁle.

The agent needs to access part 3 to collect the

symbol 3 needed to build the target molecule. The

obstacle o can be destroyed by a molecule that can be

built from atoms ), (, /. If the agent uses the atoms ),

(, / present in part 1 to destroy the obstacle o, then it

will not be able to build the target molecule for which

these atoms are required. Therefore, it also needs to

reach part 2. To destroy the obstacle C, it is necessary

to build a molecule from atoms 1,5,! (the agent must

learn this knowledge by playing different instances of

the game).

Here are the high-level steps by which the agent

might solve the game.

1. The agent starts in part 1 and collects atoms 1, 5,

2. The agent destroys the obstacle C with the created

molecule and gathers atoms /, ), ( in part 2.

3. The agent destroys the obstacle o with the created

molecule /, ), ( and gathers the atom 3.

4. The agent returns to part 1 to collect the atoms /,

), (, 5, and this ﬁnishes the game.

Figure 3 shows the molecules that the agent builds

in this game instance. Note that the order of collecting

atoms for a given molecule is given by its correspond-

ing tree.

3.2 The Deﬁnition File of the Game

Instance

Figure 2 shows a (simpliﬁed) deﬁnition ﬁle with only

one atom. The syntax is simple to understand, as

there are only four different types of deﬁnitions avail-

able (SpriteSet, LevelMapping, InteractionSet, Termi-

nationSet). The SpriteSet code block deﬁnes the en-

tities of the game, their names, properties, and graph-

ics. LevelMapping then assigns a unique symbol to

each entity deﬁned above. The InteractionSet spec-

iﬁes what happens when two deﬁned entities meet.

Multiple types of interactions are possible, and these

correspond to the event that occurs when the two en-

tities meet. The most commonly used events are the

killSprite and stepBack events. A large number of

(simpler) games can be built using these two events

only. Furthermore, the interaction transformTo is also

very helpful in cases where we need to change the af-

fected entity to a different entity. In our case, we use

this event to transform multiple atoms into molecules

or sub-molecules. Finally, the TerminationSet deﬁnes

condition(s) of what must happen in the environment

for the game to end.

For a detailed description of the syntax of the

deﬁnition ﬁles, see the original PyVGDL publication

(Schaul, 2013).

Our framework enables us to generate a practi-

cally inﬁnite number of such games that run very fast

and can be used to develop RL agents that can solve

problems by proposing subgoals to themselves. An

example of a speciﬁc environment’s deﬁnition ﬁle and

layout generated by our framework can be found in

Appendix A, where both the deﬁnition and layout cor-

respond to one environment instance.

4 GENERATIVE MODEL FOR

GAME INSTANCES

In the previous section, we have established how the

Molecule Builder environment works and what is the

goal of the agent. In this section, we describe how

each instance of the game is generated. More specif-

ically, how the ﬁnal layout and deﬁnition ﬁles men-

tioned in Section 3 are produced.

4.1 Generation of the Layout

The generation of layouts occurs in several steps. The

ﬁrst step includes the generation of molecule graphs,

which describe how each molecule can be built. Fig-

ure 3 shows graphs for three different molecules that

are used in the instance depicted in Figure 1.

Figure 3: The structure of molecules that are build during

the game shown in Figure 1. The subtrees correspond to

sub-molecules which need to be built ﬁrst, before merging

them with other atoms.

As shown, each molecule corresponds to a tree,

Molecule Builder: Environment for Testing Reinforcement Learning Agents

453

where the leaf nodes contain symbols (these are the

atoms that the agent can collect). The sub-trees corre-

spond to sub-molecules. For example, for the leftmost

graph, the agent ﬁrst needs to create a sub-molecule

from atoms ),( before it can compose it with atom / or

the second sub-molecule which is built from atoms

5,3. In total, we instantiate 20 unique atoms from

which 20 unique sub-molecules are built. From these

atoms and sub-molecules, 50 unique molecules are

built. These are generated only once and used in each

game instance. One particular molecule is always

used as the target molecule for each game.

To generate the concrete game instance, we ﬁrst

sample the number of molecules n that will be re-

quired to ﬁnish the game. Then we sample n random

molecules from the 50 available molecules and create

a dependency graph determining the order in which

these molecules could be built.

Once we have this graph, we generate the layout

by creating n parts with a variable number of rooms

in each part and connect these rooms by corridors

where each corridor connecting two different parts

is blocked by an obstacle. Finally, we position the

atoms and the agent in the rooms in such a way that

the molecules can be built in the order dictated by the

dependency graph.

4.2 Generation of the Deﬁnition File

The generation of the deﬁnition ﬁle is dependent only

on the dependency graph, which contains a descrip-

tion of the molecules used at its vertices. We ﬁrst

instantiate the SpriteSet and LevelMapping by creat-

ing an entity represented by a unique symbol for each

atom, sub-molecule and molecule. Then we instanti-

ate the rules of how these atoms interact with each

other. For certain atoms/sub-molecules/molecules,

nothing happens when they touch each other, and

these interactions are handled by the stepBack event.

For each pair of atoms/sub-molecules that can be

combined, we create two rules. One rule uses the

killSprite event to remove one of these entities, and

the second rule uses the transformTo event to trans-

form the second entity into the entity corresponding

to its composition. Additionally, we instantiate rules

determining what happens when these atoms/sub-

molecules/molecules interact with obstacles, walls, or

the agent.

The deﬁnition of the game instance is complete by

adding the TerminationSet block, which is the same

for each game and checks whether the target molecule

has been built.

5 RELATED WORK

5.1 Reinforcement Learning with

Subgoals

Originally, subgoals have been investigated in (Sut-

ton et al., 1999b). Many articles have been published

on the topic of subgoals or, more generally, on goal-

oriented reinforcement learning, such as (Czechowski

et al., 2021; Chane-Sane et al., 2021; Nasiriany et al.,

2019; Eysenbach et al., 2019; Zawalski et al., 2023)..

All of these methods have been studied on problems

that are either too simplistic (the subgoal can cor-

respond to moving the agent from one room to the

other) or do not contain a clear notion of a subgoal

(i.e., solving a Rubik’s cube). A closely related topic,

called Hierarchical Reinforcement Learning, studies

RL algorithms that use the so-called options (Vezhn-

evets et al., 2017; Barto and Mahadevan, 2003; Sut-

ton et al., 1999b; Precup and Sutton, 2000; Aubret

et al., 2019). These are high-level actions that con-

sist of many low-level ones. Most option-based algo-

rithms were developed and tested on very basic envi-

ronments, and therefore our environment can be used

as a challenging benchmark for these algorithms.

5.2 Minimalistic Reinforcement

Learning Environments

During the last decade, several frameworks have been

developed for designing RL environments. They

range from very complex ones, which are based on 3D

rendering engines (Beattie et al., 2016; Juliani et al.,

2019), through simulators of Atari games (Bellemare

et al., 2013), to ones with minimalistic ASCII graph-

ics such as MiniHack (Samvelyan et al., 2021) or

PyVGDL (Schaul, 2013). We decided to build our

environment on top of the minimalistic PyVGDL be-

cause we are interested only in testing the agent’s abil-

ity to solve problems that can be decomposed into

subgoals and we view the ability to deal with com-

plex visual patterns as a separate issue.

6 CONCLUSION

We presented a minimalistic RL environment which

is designed to test agents’ ability to solve problems

that can be naturally decomposed into subgoals. We

believe that the ability to solve problems by ﬁrst set-

ting a subgoal and then achieving this subgoal is cru-

cial for humans and will be crucial for RL agents in

the future. We believe that the environment presented

NCTA 2023 - 15th International Conference on Neural Computation Theory and Applications

454

will provide a shared testing ground for researchers

interested in this topic. In future work, we plan to

release algorithms developed using this environment

that will explore the state space by proposing subgoals

to themselves.

ACKNOWLEDGEMENT

This scientiﬁc article is part of the RICAIP project

that has received funding from the European Union’s

Horizon 2020 research and innovation programme

under grant agreement No 857306. The results were

supported by the Ministry of Education, Youth and

Sports within the dedicated program ERC CZ under

the project POSTMAN no. LL1902.

REFERENCES

Aubret, A., Matignon, L., and Hassas, S. (2019). A sur-

vey on intrinsic motivation in reinforcement learning.

CoRR, abs/1908.06976.

Barto, A. G. and Mahadevan, S. (2003). Recent advances

in hierarchical reinforcement learning. Discrete Event

Dynamic Systems, 13(4):341–379.

Beattie, C., Leibo, J. Z., Teplyashin, D., Ward, T., Wain-

wright, M., K

uttler, H., Lefrancq, A., Green, S.,

Vald

es, V., Sadik, A., Schrittwieser, J., Anderson, K.,

York, S., Cant, M., Cain, A., Bolton, A., Gaffney,

S., King, H., Hassabis, D., Legg, S., and Petersen, S.

(2016). DeepMind lab.

Bellemare, M. G., Naddaf, Y., Veness, J., and Bowling, M.

(2013). The arcade learning environment: An evalua-

tion platform for general agents. 47:253–279.

Chane-Sane, E., Schmid, C., and Laptev, I. (2021). Goal-

conditioned reinforcement learning with imagined

subgoals. CoRR, abs/2107.00541.

Czechowski, K., Odrzyg

ozdz, T., Zbysinski, M., Zawalski,

M., Olejnik, K., Wu, Y., Kucinski, L., and Milos, P.

(2021). Subgoal search for complex reasoning tasks.

CoRR, abs/2108.11204.

Eysenbach, B., Salakhutdinov, R., and Levine, S. (2019).

Search on the replay buffer: Bridging planning and

reinforcement learning.

Gavish, B. and Graves, S. C. (1978). The travelling sales-

man problem and related problems. Publisher: Mas-

sachusetts Institute of Technology, Operations Re-

search Center.

Juliani, A., Khalifa, A., Berges, V.-P., Harper, J., Teng, E.,

Henry, H., Crespi, A., Togelius, J., and Lange, D.

(2019). Obstacle tower: A generalization challenge

in vision, control, and planning.

Mirhoseini, A., Goldie, A., Yazgan, M., Jiang, J. W.,

Songhori, E., Wang, S., Lee, Y.-J., Johnson, E.,

Pathak, O., Nazi, A., et al. (2021). A graph place-

ment methodology for fast chip design. Nature,

594(7862):207–212.

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,

Antonoglou, I., Wierstra, D., and Riedmiller, M.

(2013). Playing atari with deep reinforcement learn-

ing.

Nasiriany, S., Pong, V. H., Lin, S., and Levine, S. (2019).

Planning with goal-conditioned policies.

Precup, D. and Sutton, R. S. (2000). Temporal Abstrac-

tion in Reinforcement Learning. phdthesis. ISBN:

0599844884.

Samvelyan, M., Kirk, R., Kurin, V., Parker-Holder, J.,

Jiang, M., Hambro, E., Petroni, F., K

uttler, H.,

Grefenstette, E., and Rockt

aschel, T. (2021). Mini-

Hack the Planet: A Sandbox for Open-Ended Rein-

forcement Learning Research.

Schaul, T. (2013). A video game description language for

model-based or interactive learning. In Proceedings of

the IEEE Conference on Computational Intelligence

in Games, Niagara Falls. IEEE Press.

Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai,

M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D.,

Graepel, T., Lillicrap, T., Simonyan, K., and Hass-

abis, D. (2017). Mastering chess and shogi by self-

play with a general reinforcement learning algorithm.

Sutton, R. S., McAllester, D., Singh, S., and Mansour, Y.

(1999a). Policy gradient methods for reinforcement

learning with function approximation. In Solla, S.,

Leen, T., and M

uller, K., editors, Advances in Neu-

ral Information Processing Systems, volume 12. MIT

Press.

Sutton, R. S., Precup, D., and Singh, S. (1999b). Between

mdps and semi-mdps: A framework for temporal ab-

straction in reinforcement learning. Artiﬁcial Intelli-

gence, 112(1):181–211.

Vereecken, R. (2018). PyVGDL 2.0. original-date: 2018-

07-10T16:54:41Z.

Vezhnevets, A. S., Osindero, S., Schaul, T., Heess, N.,

Jaderberg, M., Silver, D., and Kavukcuoglu, K.

(2017). Feudal networks for hierarchical reinforce-

ment learning. CoRR, abs/1703.01161.

Zawalski, M., Tyrolski, M., Czechowski, K., Stachura, D.,

Piekos, P., Odrzyg

z, T., Wu, Y., Kuci

nski, L., and

Miło

s, P. (2023). Fast and Precise: Adjusting Planning

Horizon with Adaptive Subgoal Search.

Molecule Builder: Environment for Testing Reinforcement Learning Agents

455

APPENDIX A

Sampled Environment

BasicGame block_size=10

SpriteSet

background > Immovable randomling=0.9 img=oryx/oor3 hidden=True

avatar > MovingAvatar img=oryx/knight1

wall > Immovable autoling=true img=oryx/wall3

movable >

1 > Passive img=atom_sprites/alienShotgun_0.png

3 > Passive img=atom_sprites/belt1.png

4 > Passive img=atom_sprites/book1.png

5 > Passive img=atom_sprites/bookDown.png

6 > Passive img=atom_sprites/bookUp.png

7 > Passive img=atom_sprites/boots1.png

8 > Passive img=atom_sprites/bow1.png

10 > Passive img=atom_sprites/bullet1.png

16 > Passive img=atom_sprites/buery1.png

100113 > Passive img=atom_sprites/candle1.png

100115 > Passive img=atom_sprites/cape1.png

20 > Passive img=atom_sprites/chair1.png

100117 > Passive img=atom_sprites/chicken.png

100118 > Passive img=atom_sprites/city1.png

10040 > Passive img=atom_sprites/cloak1.png

10059 > Passive img=atom_sprites/diamond1.png

10062 > Passive img=atom_sprites/eggcracked.png

10067 > Passive img=atom_sprites/eye1.png

10070 > Passive img=atom_sprites/ag1.png

10072 > Passive img=atom_sprites/goldsack.png

10074 > Passive img=atom_sprites/heart1.png

10075 > Passive img=atom_sprites/helmet1.png

obstacle >

101118 > Immovable img=obstacle_sprites/barrel1.png

11067 > Immovable img=obstacle_sprites/barrel2.png

11075 > Immovable img=obstacle_sprites/block1.png

LevelMapping

A > background avatar

w > wall

. > background

β > background 1

δ > background 3

ε > background 4

ζ > background 5

η > background 6

θ > background 7

ι > background 8

λ > background 10

ρ > background 16

ѥ > background 100113

Ѧ > background 100115

υ > background 20

ѧ > background 100117

Ѩ > background 100118

Ϸ > background 10040

Figure 4: Part 1: Sampled environment deﬁnition ﬁle.

NCTA 2023 - 15th International Conference on Neural Computation Theory and Applications

456

BasicGame block_size=10

SpriteSet

background > Immovable randomling=0.9 img=oryx/oor3 hidden=True

avatar > MovingAvatar img=oryx/knight1

wall > Immovable autoling=true img=oryx/wall3

movable >

1 > Passive img=atom_sprites/alienShotgun_0.png

3 > Passive img=atom_sprites/belt1.png

4 > Passive img=atom_sprites/book1.png

5 > Passive img=atom_sprites/bookDown.png

6 > Passive img=atom_sprites/bookUp.png

7 > Passive img=atom_sprites/boots1.png

8 > Passive img=atom_sprites/bow1.png

10 > Passive img=atom_sprites/bullet1.png

16 > Passive img=atom_sprites/buery1.png

100113 > Passive img=atom_sprites/candle1.png

100115 > Passive img=atom_sprites/cape1.png

20 > Passive img=atom_sprites/chair1.png

100117 > Passive img=atom_sprites/chicken.png

100118 > Passive img=atom_sprites/city1.png

10040 > Passive img=atom_sprites/cloak1.png

10059 > Passive img=atom_sprites/diamond1.png

10062 > Passive img=atom_sprites/eggcracked.png

10067 > Passive img=atom_sprites/eye1.png

10070 > Passive img=atom_sprites/ag1.png

10072 > Passive img=atom_sprites/goldsack.png

10074 > Passive img=atom_sprites/heart1.png

10075 > Passive img=atom_sprites/helmet1.png

obstacle >

101118 > Immovable img=obstacle_sprites/barrel1.png

11067 > Immovable img=obstacle_sprites/barrel2.png

11075 > Immovable img=obstacle_sprites/block1.png

LevelMapping

A > background avatar

w > wall

. > background

β > background 1

δ > background 3

ε > background 4

ζ > background 5

η > background 6

θ > background 7

ι > background 8

λ > background 10

ρ > background 16

ѥ > background 100113

Ѧ > background 100115

υ > background 20

ѧ > background 100117

Ѩ > background 100118

Ϸ > background 10040

Figure 5: Part 2: Sampled environment deﬁnition ﬁle.

Molecule Builder: Environment for Testing Reinforcement Learning Agents

457

Figure 6: Sampled environment layout.

NCTA 2023 - 15th International Conference on Neural Computation Theory and Applications

458