plification, analyzing the scalability of these methods
in larger domains will also be important. Note that
most dimensionality reduction approaches substitute
the state set or the evaluation functions in (PO)MDP’s
and are therefore subject to different convergence and
optimality criteria. Our proposal is similar, assuming
that complex problems in POMDP form (P) have a
simpler, underlying representation (P
0
) from which
solutions may be extracted. These solutions should
be near-optimal within provable limits for P
0
, so an
additional challenge is finding what form of rele-
vance functions and operators preserve these proper-
ties when transferring policies back to P. We expect
the relevance thresholds previously introduced will al-
low us to estimate the approximation error.
Since this paper outlines a research project, there
is much work yet to be done. The core of this method-
ology are the context-sensitive relevance functions
and operators. A fully-functional system will require
state, observation and belief estimation and aggre-
gation. Efficient action selection, through simulation
techniques, might be a key step in avoiding irrelevant
transitions. Finally, a domain model binds these mod-
ules together and supports the practical assumptions.
Putting it all together is a challenge in its own right
but using context-sensitive criteria is the main inno-
vation of our proposal.
ACKNOWLEDGEMENTS
This work is supported by a DAAD research grant.
REFERENCES
Boutilier, C., Dean, T., and Hanks, S. (1996). Planning un-
der uncertainty: Structural assumptions and computa-
tional leverage. In In Proc. 2nd Eur. Worksh. Plan-
ning, pages 157–171. IOS Press.
Ghallab, M., Nau, D., and Traverso, P. (2016). Automated
Planning and Acting. Cambridge University Press,
San Francisco, CA, USA.
Grzes, M. and Kudenko, D. (2008). Plan-based reward
shaping for reinforcement learning. In Intelligent Sys-
tems, 2008. IS ’08. 4th Intl. IEEE Conf., volume 2,
pages 10–22–10–29.
Hanheide, M., G
¨
obelbecker, M., Horn, G. S., Pronobis, A.,
Sj
¨
o
¨
o, K., Aydemir, A., Jensfelt, P., Gretton, C., Dear-
den, R., Janicek, M., Zender, H., Kruijff, G.-J., Hawes,
N., and Wyatt, J. L. (2015). Robot task planning and
explanation in open and uncertain worlds. Artificial
Intelligence, pages –.
Hertzberg, J., Zhang, J., Zhang, L., Rockel, S., Neumann,
B., Lehmann, J., Dubba, K. S. R., Cohn, A. G., Saf-
fiotti, A., Pecora, F., Mansouri, M., Kone
ˇ
cn
´
y,
ˇ
S.,
G
¨
unther, M., Stock, S., Lopes, L. S., Oliveira, M.,
Lim, G. H., Kasaei, H., Mokhtari, V., Hotz, L., and
Bohlken, W. (2014). The RACE project. KI -
K
¨
unstliche Intelligenz, 28(4):297–304.
Hester, T. and Stone, P. (2013). TEXPLORE: Real-time
sample-efficient reinforcement learning for robots.
Machine Learning, 90(3).
Kearns, M., Mansour, Y., and Ng, A. Y. (2002). A Sparse
Sampling Algorithm for Near-Optimal Planning in
Large Markov Decision Processes. Mach. Learn.,
49(2-3):193–208.
Kocsis, L. and Szepesv
´
ari, C. (2006). Bandit based
Monte-Carlo Planning. In ECML-06, pages 282–293.
Springer.
Kushmerick, N., Hanks, S., and Weld, D. (1994). An algo-
rithm for probabilistic least-commitment planning. In
AAAI-94, pages 1073–1078.
Ng, A. Y., Harada, D., and Russell, S. (1999). Policy invari-
ance under reward transformations: Theory and appli-
cation to reward shaping. In In Proc. 16th Intl. Conf.
Mach. Learn., pages 278–287. Morgan Kaufmann.
Pineau, J., Gordon, G., and Thrun, S. (2003). Policy-
contingent abstraction for robust robot control. In
Proc. 19th Conf. on Uncertainty in Artificial Intelli-
gence, UAI’03, pages 477–484, San Francisco, CA,
USA. Morgan Kaufmann Publishers Inc.
Pineau, J., Gordon, G. J., and Thrun, S. (2006). Anytime
point-based approximations for large POMDPs. Jour-
nal of Artificial Intelligence Research, 27:335–380.
Silver, D. and Veness, J. (2010). Monte-Carlo Planning in
Large POMDPs. In In Advances in Neural Informa-
tion Processing Systems 23, pages 2164–2172.
Singh, S. P., Jaakkola, T., and Jordan, M. I. (1995). Re-
inforcement learning with soft state aggregation. In
Tesauro, G., Touretzky, D. S., and Leen, T. K., editors,
Advances in Neural Information Processing Systems
7, pages 361–368. MIT Press.
Smith, T. and Simmons, R. (2004). Heuristic Search Value
Iteration for POMDPs. In Proc. 20th Conf. on Uncer-
tainty in Artificial Intelligence, UAI ’04, pages 520–
527, Arlington, Virginia, United States. AUAI Press.
Sperber, D. and Wilson, D. (1995). Relevance: Communica-
tion and Cognition. Blackwell Publishers, Cambridge,
MA, USA, 2nd edition.
Sutton, R. S. and Barto, A. G. (2012). Reinforcement Learn-
ing: An Introduction. MIT Press, Cambridge, MA,
USA, 2nd edition. (to be published).
Thi
´
ebaux, S. and Hertzberg, J. (1992). A semi-reactive
planner based on a possible models action formal-
ization. In Artificial Intelligence Planning Systems:
Proc. 1st Intl. Conf. (AIPS92), pages 228–235. Mor-
gan Kaufmann.
Vien, N. A. and Toussaint, M. (2015). Hierarchical Monte-
Carlo Planning. In AAAI-15.
von Neumann, J. and Morgenstern, O. (1944). Theory of
Games and Economic Behavior. Princeton University
Press.
ICAART 2017 - 9th International Conference on Agents and Artificial Intelligence
502