0
0.2
0.4
0.6
0.8
1
Figure 2: Cliff-walk gridworld. The goal of a moving agent
is to reach the green square, starting from the red one. En-
tering the dark squares (representing a cliff) results in a high
negative reward. Superimposed is the learned path after 100
episodes. The path color indicates the expected reward by
displaying the value of min(1,
expected reward
goal reward
) using the dis-
played color key.
-20
-10
0
10
20
30
40
50
60
70
80
90
0 50 100 150 200 250 300
new
old
plain
-10
0
10
20
30
40
50
60
70
80
90
0 50 100 150 200 250 300
newn
oldn
plainn
Figure 3: Results. The diagrams show the rewards over a
series of 300 episodes. On the top the results with futile in-
formation are depticted, on the bottom the results without
futile information. plain/plainn show results of a plain Q-
learner, old/oldn show results of revisions with Equation 6,
and new/newn show results of revisions with Equation 13.
For the OCF-augmented learners the values are averages of
1000 runs. Since the plain Q-learner exhibits large varia-
tions, its values have been averaged over 2000 runs.
needs to be examined in more detail. Especially the
analysis of the symbolic belief representation in dif-
ferent contexts is certainly on our agenda. First exper-
iments indicate the accumulation of a proper symbolic
description of favorable state-action-pairs.
The symbolic representation also allows for sym-
bolic reasoning to incorporate a top-down path of
learning. The combination of theses techniques is
definitely of interest and needs to be addressed in fu-
ture publications.
ACKNOWLEDGEMENTS
This research was funded by the German Research
Association (DFG) under Grant PE 887/3-3.
REFERENCES
Alchourron, C. E., Gardenfors, P., and Makinson, D.
(1985). On the logic of theory change: Partial meet
contraction and revision functions. J. Symbolic Logic,
50(2):510–530.
Anderson, J. R. (1983). The architecture of cognition. Hard-
vard University Press, Cambridge, MA.
Darwiche, A. and Pearl, J. (1996). On the logic of iterated
belief revision. Artificial intelligence, 89:1–29.
Dzeroski, S., Raedt, L. D., and Driessens, K. (2001). Rela-
tional reinforcement learning. In Machine Learning,
volume 43, pages 7–52.
Gombert, J.-E. (2003). Implicit and explicit learning to
read: Implication as for subtypes of dyslexia. Current
psychology letters, 1(10).
H¨aming, K. and Peters, G. (2010). An alternative ap-
proach to the revision of ordinal conditional functions
in the context of multi-valued logic. In Proceedings of
the 20th International Conference on Artificial Neural
Networks, volume LNCS 6353, pages 200–203.
Kern-Isberner, G. (2001). Conditionals in nonmonotonic
reasoning and belief revision: considering condition-
als as agents. Springer-Verlag New York, Inc.
Leopold, T., Kern Isberner, G., and Peters, G. (2008). Com-
bining reinforcement learning and belief revision: A
learning system for active vision. In Proceedings of
the 19th British Machine Vision Conference, pages
473–482.
Reber, A. S. (1989). Implicit learning and tacit knowl-
edge. Journal of Experimental Psycology: General,
3(118):219–235.
Spohn, W. (2009). A survey of ranking theory. In Degrees
of Belief. Springer.
Sun, R., Merrill, E., and Peterson, T. (2001). From implicit
skills to explicit knowledge: A bottom-up model of
skill learning. In Cognitive Science, volume 25(2),
pages 203–244.
Sun, R., Terry, C., and Slusarz, P. (2005). The interaction of
the explicit and the implicit in skill learning: A dual-
process approach. Psychological Review, 112:159–
192.
Sun, R., Zhang, X., Slusarz, P., and Mathews, R. (2006).
The interaction of implicit learning, explicit hypothe-
sis testing, and implicit-to-explicit knowledge extrac-
tion. Neural Networks, 1(20):34–47.
Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learn-
ing: An Introduction. MIT Press, Cambridge.
Ye, C., Yung, N. H. C., and Wang, D. (2003). A fuzzy
controller with supervised learning assisted reinforce-
ment learning algorithm for obstacle avoidance. IEEE
Transactions on Systems, Man, and Cybernetics, Part
B, 33(1):17–27.
IMPROVED REVISION OF RANKING FUNCTIONS FOR THE GENERALIZATION OF BELIEF IN THE CONTEXT
OF UNOBSERVED VARIABLES
123