Authors:
Raphael Fonteneau
1
;
Susan A. Murphy
2
;
Louis Wehenkel
1
and
Damien Ernst
3
Affiliations:
1
University of Liège, Belgium
;
2
University of Michigan, United States
;
3
University of Liege, Belgium
Keyword(s):
Reinforcement learning, Prior knowledge, Cautious generalization.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Computational Intelligence
;
Evolutionary Computing
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Soft Computing
;
Symbolic Systems
;
Uncertainty in AI
Abstract:
In the context of a deterministic Lipschitz continuous environment over continuous state spaces, finite action spaces, and a finite optimization horizon, we propose an algorithm of polynomial complexity which exploits weak prior knowledge about its environment for computing from a given sample of trajectories and for a given
initial state a sequence of actions. The proposed Viterbi-like algorithm maximizes a recently proposed lower bound on the return depending on the initial state, and uses to this end prior knowledge about the environment provided in the form of upper bounds on its Lipschitz constants. It thereby avoids, in way depending on the initial state and on the the prior knowledge, those regions of the state space where the sample is too sparse to make safe generalizations. Our experiments show that it can lead to more cautious policies than algorithms combining dynamic programming with function approximators. We give also a condition on the sample sparsity ensuring that,
for a given initial state, the proposed algorithm produces an optimal sequence of actions in open-loop.
(More)