agent could potentially mitigate this issue, allowing
parallel execution without interference and maintain-
ing computational efficiency.
Overall, effective memory management strate-
gies are crucial in reinforcement learning tasks. By
carefully selecting and optimizing memory resetting
strategies, significant improvements can be achieved
in the efficiency and effectiveness of TPG in challeng-
ing control environments.
6 FUTURE WORK
Future work will scale these experiments to more
complex environments, such as Memory Gym
(Pleines et al., 2023), in order to validate the meth-
ods’ robustness and explore their adaptability to tasks
with long and short time dependencies. The current
memory strategies help agents quickly build mental
models without directly sharing information. How-
ever, this may not be suitable in complex tasks where
global memory is beneficial (e.g. (Smith and Hey-
wood, 2019)). For such cases, we envision a dynamic
method, such as resetting memory based on real-time
performance metrics (e.g., wiping memory if median
score drops below that of the previous generation),
could provide a more adaptive approach. Addition-
ally, investigating other probabilistic memory func-
tions and their combinations could provide further in-
sights into optimizing agent’s memory use. For exam-
ple, rather than manually resetting memory, it might
be possible to evolve customized memory manage-
ment rules for each agent which automatically min-
imize negative effects on shared memory. Finally,
integrating advanced parallelization techniques could
mitigate the runtime overhead caused by memory re-
sets, improving their practicality in real-world appli-
cations. Since this paper incurred significant wall
clock run time, faster TPG frameworks, such as those
from (Djavaherpour et al., 2024), will be considered
for use in future work.
Overall, studying the long-term evolutionary im-
pacts of different memory strategies could provide
deeper insights into the development of more sophis-
ticated and adaptive agents in partially observable en-
vironments.
REFERENCES
Amaral, R. (2019). Pytpg: Tangled program graphs
in python. https://github.com/Ryan-Amaral/PyTPG/
tree/7295f90ececbfc34fdbc1d73e032a9c2407a182c.
Brameier, M. and Banzhaf, W. (2007). Linear Genetic Pro-
gramming. Springer.
Brockman, G., Cheung, V., Pettersson, L., Schneider, J.,
Schulman, J., Tang, J., and Zaremba, W. (2016). Ope-
nAI Gym. arXiv, 1606.01540.
Djavaherpour, T., Naqvi, A., Zhuang, E., and Kelly, S.
(2024). Evolving Many-Model Agents with Vector
and Matrix Operations in Tangled Program Graphs.
In Genetic Programming Theory and Practice XXI.
Springer (AD).
Kelly, S. and Heywood, M. I. (2018). Emergent Solutions
to High-Dimensional Multitask Reinforcement Learn-
ing. Evolutionary Computation, 26(3):347–380.
Kelly, S., Newsted, J., Banzhaf, W., and Gondro, C. (2020).
A modular memory framework for time series predic-
tion. In Proceedings of the 2020 Genetic and Evolu-
tionary Computation Conference, GECCO ’20, page
949–957, New York, NY, USA. Association for Com-
puting Machinery.
Kelly, S., Smith, R. J., Heywood, M. I., and Banzhaf, W.
(2021). Emergent tangled program graphs in partially
observable recursive forecasting and vizdoom naviga-
tion tasks. ACM Trans. Evol. Learn. Optim., 1(3).
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Ve-
ness, J., Bellemare, M. G., Graves, A., Riedmiller,
M. A., Fidjeland, A. K., Ostrovski, G., Petersen, S.,
Beattie, C., Sadik, A., Antonoglou, I., King, H., Ku-
maran, D., Wierstra, D., Legg, S., and Hassabis, D.
(2015). Human-level control through deep reinforce-
ment learning. Nature, 518:529–533.
Pleines, M., Pallasch, M., Zimmer, F., and Preuss, M.
(2023). Memory gym: Partially observable challenges
to memory-based agents. In The Eleventh Interna-
tional Conference on Learning Representations.
Smith, R. J. and Heywood, M. I. (2019). A model of
external memory for navigation in partially observ-
able visual reinforcement learning tasks. In Genetic
Programming: 22nd European Conference, EuroGP
2019, Held as Part of EvoStar 2019, Leipzig, Ger-
many, April 24–26, 2019, Proceedings, page 162–177,
Berlin, Heidelberg. Springer-Verlag.
Smith, R. J. and Heywood, M. I. (2024). Interpreting tan-
gled program graphs under partially observable dota 2
invoker tasks. IEEE Transactions on Artificial Intelli-
gence, 5(4):1511–1524.
Spector, L. and Luke, S. (1996a). Cultural transmission of
information in genetic programming. In Proceedings
of the 1st Annual Conference on Genetic Program-
ming, page 209–214, Cambridge, MA, USA. MIT
Press.
Spector, L. and Luke, S. (1996b). Culture enhances the
evolvability of cognition. In Cottrell, G., editor, Cog-
nitive Science (CogSci) 1996 Conference Proceed-
ings, pages 672–677, Mahwah, NJ, USA. Lawrence
Erlbaum Associates.
Sutton, R. and Barto, A. (2018). Reinforcement Learning:
An Introduction. The MIT Press, Cambridge, MA,
2nd edition.
Teller, A. (1994). The evolution of mental models, page
199–217. MIT Press, Cambridge, MA, USA.
Tangled Program Graphs with Indexed Memory in Control Tasks with Short Time Dependencies
303