tures of a node or add new ones to it in order to reflect
how influence has the node to the feature globally,
even if such feature is not actually present in it. We
will do it by creating virtual relations between nodes
and influence areas for the features. We also want
to improve our method with the ability to work with
training data from different maps, by calculating an
estimated value of the map nodes in real time.
Finally, we have plans to contrast the behavior of
our agents with real pedestrians data in further exper-
iments. In order to do this and due to legal and logis-
tic issues in tracking crowds effectively, we want to
apply our system to more manageable domains, like
public events (concretely fireworks festivals, which
have been used before to collect data for CrowdWalk)
where it is possible to enact certain degree of control
and surveillance to the crowd, or customer behavior
inside department stores or supermarkets.
REFERENCES
Abbeel, P. and Ng, A. Y. (2004). Apprenticeship learning
via inverse reinforcement learning. In Proceedings of
the twenty-first international conference on Machine
learning, page 1. ACM.
Choi, J. and Kim, K.-E. (2012). Nonparametric bayesian in-
verse reinforcement learning for multiple reward func-
tions. In Advances in Neural Information Processing
Systems, pages 305–313.
Crociani, L., Vizzari, G., Yanagisawa, D., Nishinari, K.,
and Bandini, S. (2016). Route choice in pedestrian
simulation: Design and evaluation of a model based
on empirical observations. Intelligenza Artificiale,
10(2):163–182.
Dvijotham, K. and Todorov, E. (2010). Inverse optimal con-
trol with linearly-solvable mdps. In Proceedings of the
27th International Conference on Machine Learning
(ICML-10), pages 335–342.
Faccin, J., Nunes, I., and Bazzan, A. (2017). Understand-
ing the Behaviour of Learning-Based BDI Agents in
the Braess’ Paradox, pages 187–204. Springer Inter-
national Publishing.
Kohjima, M., Matsubayashi, T., and Sawada, H. (2017).
What-if prediction via inverse reinforcement learning.
In Proceedings of the Thirtieth International Florida
Artificial Intelligence Research Society Conference,
FLAIRS 2017, Marco Island, Florida, USA, May 22-
24, 2017., pages 74–79.
Krishnan, S., Garg, A., Liaw, R., Miller, L., Pokorny, F. T.,
and Goldberg, K. (2016). Hirl: Hierarchical inverse
reinforcement learning for long-horizon tasks with de-
layed rewards. arXiv preprint arXiv:1604.06508.
L
¨
ammel, G., Grether, D., and Nagel, K. (2010). The rep-
resentation and implementation of time-dependent in-
undation in large-scale microscopic evacuation simu-
lations. Transportation Research Part C: Emerging
Technologies, 18(1):84–98.
Levine, S., Popovic, Z., and Koltun, V. (2011). Nonlin-
ear inverse reinforcement learning with gaussian pro-
cesses. In Advances in Neural Information Processing
Systems, pages 19–27.
Luo, L., Zhou, S., Cai, W., Low, M. Y. H., Tian, F., Wang,
Y., Xiao, X., and Chen, D. (2008). Agent-based hu-
man behavior modeling for crowd simulation. Com-
puter Animation and Virtual Worlds, 19(3-4):271–
281.
Martinez-Gil, F., Lozano, M., and Fern
´
andez, F. (2017).
Emergent behaviors and scalability for multi-agent re-
inforcement learning-based pedestrian models. Simu-
lation Modelling Practice and Theory, 74:117–133.
Michini, B. and How, J. P. (2012). Bayesian nonparamet-
ric inverse reinforcement learning. In Joint European
Conference on Machine Learning and Knowledge
Discovery in Databases, pages 148–163. Springer.
Natarajan, S., Kunapuli, G., Judah, K., Tadepalli, P., Ker-
sting, K., and Shavlik, J. (2010). Multi-agent inverse
reinforcement learning. In 2010 Ninth International
Conference on Machine Learning and Applications,
pages 395–400. IEEE.
Neal, R. M. (2000). Markov chain sampling methods for
dirichlet process mixture models. Journal of compu-
tational and graphical statistics, 9(2):249–265.
Ng, A. Y., Russell, S. J., et al. (2000). Algorithms for in-
verse reinforcement learning. In Icml, pages 663–670.
Siebra, C. d. A. and Neto, G. P. B. (2014). Evolving the be-
havior of autonomous agents in strategic combat sce-
narios via sarsa reinforcement learning. In Proceed-
ings of the 2014 Brazilian Symposium on Computer
Games and Digital Entertainment, SBGAMES ’14,
pages 115–122, Washington, DC, USA. IEEE Com-
puter Society.
Surana, A. and Srivastava, K. (2014). Bayesian nonpara-
metric inverse reinforcement learning for switched
markov decision processes. In Machine Learning and
Applications (ICMLA), 2014 13th International Con-
ference on, pages 47–54. IEEE.
Svetlik, M., Leonetti, M., Sinapov, J., Shah, R., Walker,
N., and Stone, P. (2016). Automatic curriculum graph
generation for reinforcement learning agents.
Torrens, P. M., Nara, A., Li, X., Zhu, H., Griffin, W. A., and
Brown, S. B. (2012). An extensible simulation en-
vironment and movement metrics for testing walking
behavior in agent-based models. Computers, Environ-
ment and Urban Systems, 36(1):1–17.
Yamashita, T., Soeda, S., and Noda, I. (2009). Evac-
uation planning assist system with network model-
based pedestrian simulator. In PRIMA, pages 649–
656. Springer.
Zhifei, S. and Meng Joo, E. (2012). A survey of in-
verse reinforcement learning techniques. Interna-
tional Journal of Intelligent Computing and Cybernet-
ics, 5(3):293–311.
Ziebart, B. D., Maas, A. L., Bagnell, J. A., and Dey,
A. K. (2008). Maximum entropy inverse reinforce-
ment learning. In AAAI, volume 8, pages 1433–1438.
Chicago, IL, USA.
ICAART 2019 - 11th International Conference on Agents and Artificial Intelligence
894