executions are represented by using symbolic repre-
sentations. However, the approaches did not consider
the feasibility of actions.
This study aimed to understand the action possi-
bility, which implies the feasible actions and their po-
sitions in real space, and its variations in real space
based on the world state. To archive the objective,
we proposed a WDAG based on knowledge represen-
tation using scene graphs. In particular, we adopted
the scene graph to represent the knowledge of action,
because it is an environmental representation that is
compatible with the world state and contains both ge-
ometric and semantic information. In addition, the
WDAG represented the mutual interaction between
the action and its possibility in a recursive multi-
layered graph structure. Accordingly, a construction
method of an action graph was established based on
the scene graph-based representation of action effects
and a recursive multilayered graph structure. This al-
lowed the capturing of the action possibility of agents
and the recursive variations of the action possibility
depending on the world state. The effectiveness of
the proposed method was verified by simulation, as-
suming a coffee shop environment. Moreover, the fol-
lowing two points were verified. 1) WDAG represents
the action possibility in real space based on the world
state. 2) WDAG represents the variations in the action
possibility caused by the agent’s action on the recur-
sive multilayered structure.
In future, we will validate the effectiveness of
WDAG in practice by implementing a planning
method of action sequences based on WDAG and ap-
plying to task plannings in real space. Task planning
based on WDAG is expected to yield more efficient
plans, such as plans with shorter movement distances,
by considering geometric information such as object
placement.
ACKNOWLEDGMENTS
This study was supported by the Core Research for
Evolutional Science and Technology (CREST) of the
Japan Science and Technology Agency (JST) under
Grant Number JPMJCR19A1.
REFERENCES
Armeni, I., He, Z.-Y., Zamir, A., Gwak, J., Malik, J., Fis-
cher, M., and Savarese, S. (2019). 3d scene graph:
A structure for unified semantics, 3d space, and cam-
era. In 2019 IEEE/CVF International Conference on
Computer Vision (ICCV), pages 5663–5672.
Chu, F.-J., Xu, R., Seguin, L., and Vela, P. A. (2019).
Toward affordance detection and ranking on novel
objects for real-world robotic manipulation. IEEE
Robotics and Automation Letters, 4(4):4070–4077.
Cordella, L., Foggia, P., Sansone, C., and Vento, M. (2004).
A (sub)graph isomorphism algorithm for matching
large graphs. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 26(10):1367–1372.
Do, T.-T., Nguyen, A., and Reid, I. (2018). Affordancenet:
An end-to-end deep learning approach for object af-
fordance detection. In 2018 IEEE International Con-
ference on Robotics and Automation (ICRA), pages
5882–5889.
Dreher, C. R., Waechter, M., and Asfour, T. (2019).
Learning Object-Action Relations from Bimanual Hu-
man Demonstration Using Graph Networks. IEEE
Robotics and Automation Letters.
Fikes, R. E. and Nilsson, N. J. (1971). Strips: A new ap-
proach to the application of theorem proving to prob-
lem solving. Artificial Intelligence, 2(3):189–208.
Fox, M. and Long, D. (2003). PDDL 2.1: An extension to
pddl for expressing temporal planning domains. Jour-
nal of artificial intelligence research, 20:61–124.
Kawasaki, Y., Mochizuki, S., and Takahashi, M. (2021).
Astron: Action-based spatio-temporal robot naviga-
tion. IEEE Access, 9:141709–141724.
Kim, U.-H., Park, J.-M., Song, T.-J., and Kim, J.-H. (2019).
3-D Scene Graph: A Sparse and Semantic Representa-
tion of Physical Environments for Intelligent Agents.
IEEE Transactions on Cybernetics, pages 1–13.
Koppula, H. S. and Saxena, A. (2016). Anticipating human
activities using object affordances for reactive robotic
response. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 38(1):14–29.
Liang, J., Jiang, L., Niebles, J. C., Hauptmann, A. G., and
Fei-Fei, L. (2019). Peeking into the future: Predict-
ing future person activities and locations in videos. In
Proceedings of the IEEE Conference on Computer Vi-
sion and Pattern Recognition, pages 5725–5734.
L
¨
uddecke, T. and W
¨
org
¨
otterr, F. (2020). Fine-grained action
plausibility rating. Robotics and Autonomous Systems,
129:103511.
Rhinehart, N. and Kitani, K. M. (2016). Learning action
maps of large environments via first-person vision. In
Proceedings of the 2016 IEEE Conference on Com-
puter Vision and Pattern Recognition (CVPR), pages
580–588.
Ziaeetabar, F., Aksoy, E. E., W
¨
org
¨
otter, F., and Tamosiu-
naite, M. (2017). Semantic analysis of manipula-
tion actions using spatial relations. In 2017 IEEE In-
ternational Conference on Robotics and Automation
(ICRA), pages 4612–4619.
ICINCO 2022 - 19th International Conference on Informatics in Control, Automation and Robotics
466