steps (K
uttler et al., 2020; Izumiya and Simo-Serra,
2021). Therefore, considering the cost of implement-
ing and training other algorithms, we did not perform
comparisons with other methods. Thus, we aim to
further enhance the efficiency and speed of training,
conduct training for 1 billion environment steps, and
compare our method with others. Additionally, we
plan to include settings other than “Monk-Human-
Neutral-Male” to improve generalizability in NLE.
This work was supported by JSPS KAKENHI Grant
Numbers JP22K12157, JP23K28377, JP24H00714.
We acknowledge the assistance for the ChatGPT
(GPT-4o and 4o mini) was used for proofreading,
which was further reviewed and revised by the au-
Efficient Models Deep Reinforcement Learning for NetHack Strategies