Online Learning of non-Markovian Reward Models

Gavin Rens; Jean-François Raskin; Raphaël Reynouard; Giuseppe Marra

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Online Learning of non-Markovian Reward Models

Topics: Machine Learning; Model-Based Reasoning; Planning and Scheduling

In Proceedings of the 13th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, 74-86, 2021

Authors: Gavin Rens ¹ ; Jean-François Raskin ² ; Raphaël Reynouard ² and Giuseppe Marra ¹

Affiliations: ¹ DTAI Group, KU Leuven, Belgium ; ² Université Libre de Bruxelles, Belgium

Keyword(s): non-Markovian Rewards, Learning Mealy Machines, Angluin’s Algorithm.

Abstract: There are situations in which an agent should receive rewards only after having accomplished a series of previous tasks, that is, rewards are non-Markovian. One natural and quite general way to represent history- dependent rewards is via a Mealy machine. In our formal setting, we consider a Markov decision process (MDP) that models the dynamics of the environment in which the agent evolves and a Mealy machine synchronized with this MDP to formalize the non-Markovian reward function. While the MDP is known by the agent, the reward function is unknown to the agent and must be learned. Our approach to overcome this challenge is to use Angluin’s L∗ active learning algorithm to learn a Mealy machine representing the underlying non-Markovian reward machine (MRM). Formal methods are used to determine the optimal strategy for answering so-called membership queries posed by L∗. Moreover, we prove that the expected reward achieved will eventually be at least as much as a given, reasonable valu e provided by a domain expert. We evaluate our framework on two problems. The results show that using L∗ to learn an MRM in a non-Markovian reward decision process is effective. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 18.222.156.75

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Rens, G., Raskin, J.-F., Reynouard, R. and Marra, G. (2021). Online Learning of non-Markovian Reward Models. In Proceedings of the 13th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART; ISBN 978-989-758-484-8; ISSN 2184-433X, SciTePress, pages 74-86. DOI: 10.5220/0010212000740086

@conference{icaart21,
author={Gavin Rens and Jean{-}Fran\c{c}ois Raskin and Raphaël Reynouard and Giuseppe Marra},
title={Online Learning of non-Markovian Reward Models},
booktitle={Proceedings of the 13th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART},
year={2021},
pages={74-86},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010212000740086},
isbn={978-989-758-484-8},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the 13th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART
TI - Online Learning of non-Markovian Reward Models
SN - 978-989-758-484-8
IS - 2184-433X
AU - Rens, G.
AU - Raskin, J.
AU - Reynouard, R.
AU - Marra, G.
PY - 2021
SP - 74
EP - 86
DO - 10.5220/0010212000740086
PB - SciTePress