Based on the results we can report that the CSRN
is capable of generalizing and scaling not only in the
maze domain which we consider a simple baseline but
also in the Sokoban domain.
7 CONCLUSION
We successfully trained CSNR architecture on both
maze and Sokoban domains and evaluated its scal-
ing and generalizing ability on unseen problem in-
stances. We also integrated trained CSRNs into a
planner and compared their performance with other
commonly used heuristic functions. As we already
stated, we are using the image-like grid representa-
tion of the problems. Thanks to that, we can say that
we work with a model-free planning framework be-
cause our planner does not require a domain / problem
model for the computation.
The generalizing and scaling experiments showed
that the CSRN architecture is able to generalize very
well on maze domain where it outperformed the refer-
ence solution in all data sets. In case of the Sokoban
domain, we were able to achieve 96% coverage and
even though the CSRN was finding longer solution in
general, it was also able to decrease number of ex-
panded states compared to the reference solution.
The planning experiments showed that the CSRN
for maze domain provided comparable results to the
classical heuristics, however, it expanded a larger
amount of states in the process. In the Sokoban do-
main, we saw a great coverage on the 8x8 data set
which contained larger instances than the data sets in
generalizing and scaling experiments. However, we
also saw limitations of the trained configurations as
the results on the other two data sets showed next to
no coverage. That is caused by the size of the prob-
lem instances which influence the complexity as well.
Still, training the network on one 3x3 sample pro-
vided us with great results on the 8x8 data set and
a promising direction for a follow-up research of the
CSRN ability to generalize.
These results show us that the CSRN architec-
ture might be the right tool for grid-based domains in
terms of heuristic computation. Other further research
direction would be to explore achieving domain-
independence of this approach. So far, we have two
ways of achieving that. One is creating an algorithm
that would select appropriate variables in the problem
domain which would allow us to create a 2D projec-
tion of the problem. The other way is creating an al-
ternative representation of the problem which would
be still processed by the CSRN architecture but with-
out the condition of the grid structure, similarly as
in Natural Language Processing using a linear vector
representation, for instance.
Addressing these ideas could lead to a model-free,
scale-free and domain-independent heuristic function
learned by a neural network on small tractable prob-
lem samples. In the future, we would like to focus on
these mentioned challenges and provide such frame-
work that could process any given problem.
ACKNOWLEDGEMENTS
The work of Michaela Urbanovsk
´
a was
supported by the OP VVV funded project
CZ.02.1.01/0.0/0.0/16019/0000765 “Research
Center for Informatics” and the work of Anton
´
ın
Komenda was supported by the Czech Science
Foundation (grant no. 21-33041J).
REFERENCES
Aeronautiques, C., Howe, A., Knoblock, C., McDermott,
I. D., Ram, A., Veloso, M., Weld, D., SRI, D. W., Bar-
rett, A., Christianson, D., et al. (1998). Pddl— the
planning domain definition language. Technical re-
port, Technical Report.
Asai, M. and Fukunaga, A. (2017). Classical planning in
deep latent space: From unlabeled images to PDDL
(and back). In Besold, T. R., d’Avila Garcez, A. S.,
and Noble, I., editors, Proceedings of the Twelfth In-
ternational Workshop on Neural-Symbolic Learning
and Reasoning, NeSy 2017, London, UK, July 17-18,
2017, volume 2003 of CEUR Workshop Proceedings.
CEUR-WS.org.
Asai, M. and Fukunaga, A. (2018). Classical planning in
deep latent space: Bridging the subsymbolic-symbolic
boundary. In Thirty-Second AAAI Conference on Ar-
tificial Intelligence.
Bundy, A. and Wallen, L. (1984). Breadth-first search. In
Catalogue of artificial intelligence tools, pages 13–13.
Springer.
Groshev, E., Tamar, A., Goldstein, M., Srivastava, S., and
Abbeel, P. (2018). Learning generalized reactive poli-
cies using deep neural networks. In 2018 AAAI Spring
Symposium Series.
Guez, A., Mirza, M., Gregor, K., Kabra, R., Racaniere,
S., Weber, T., Raposo, D., Santoro, A., Orseau,
L., Eccles, T., Wayne, G., Silver, D., Lilli-
crap, T., and Valdes, V. (2018). An investi-
gation of model-free planning: boxoban levels.
https://github.com/deepmind/boxoban-levels/.
Hoffmann, J. (2001). Ff: The fast-forward planning system.
AI magazine, 22(3):57–57.
Ilin, R., Kozma, R., and Werbos, P. J. (2006). Cellular srn
trained by extended kalman filter shows promise for
ICAART 2022 - 14th International Conference on Agents and Artificial Intelligence
212