Improving Reward Estimation in Goal-Conditioned Imitation Learning with Counterfactual Data and Structural Causal Models

Mohamed Jabri; Mohamed Jabri; Mohamed Jabri; Mohamed Jabri; Panagiotis Papadakis; Panagiotis Papadakis; Ehsan Abbasnejad; Ehsan Abbasnejad; Ehsan Abbasnejad; Gilles Coppin; Gilles Coppin; Javen Shi; Javen Shi; Javen Shi

doi:10.5220/0012268200003543

Improving Reward Estimation in Goal-Conditioned Imitation Learning with Counterfactual Data and Structural Causal Models

Mohamed Jabri, Mohamed Jabri, Mohamed Jabri, Mohamed Jabri, Panagiotis Papadakis, Panagiotis Papadakis, Ehsan Abbasnejad, Ehsan Abbasnejad, Ehsan Abbasnejad, Gilles Coppin, Gilles Coppin, Javen Shi, Javen Shi, Javen Shi

2023

Abstract

Imitation learning has emerged as a pragmatic alternative to reinforcement learning for teaching agents to execute specific tasks, mitigating the complexity associated with reward engineering. However, the deployment of imitation learning in real-world scenarios is hampered by numerous challenges. Often, the scarcity and expense of demonstration data hinder the effectiveness of imitation learning algorithms. In this paper, we present a novel approach to enhance the sample efficiency of goal-conditioned imitation learning. Leveraging the principles of causality, we harness structural causal models as a formalism to generate counterfactual data. These counterfactual instances are used as additional training data, effectively improving the learning process. By incorporating causal insights, our method demonstrates its ability to improve imitation learning efficiency by capitalizing on generated counterfactual data. Through experiments on simulated robotic manipulation tasks, such as pushing, moving, and sliding objects, we showcase how our approach allows for the learning of better reward functions resulting in improved performance with a limited number of demonstrations, paving the way for a more practical and effective implementation of imitation learning in real-world scenarios.

Download

Paper Citation

in Harvard Style

Jabri M., Papadakis P., Abbasnejad E., Coppin G. and Shi J. (2023). Improving Reward Estimation in Goal-Conditioned Imitation Learning with Counterfactual Data and Structural Causal Models. In Proceedings of the 20th International Conference on Informatics in Control, Automation and Robotics - Volume 2: LICARSA; ISBN 978-989-758-670-5, SciTePress, pages 329-337. DOI: 10.5220/0012268200003543

in Bibtex Style

@conference{licarsa23,
author={Mohamed Jabri and Panagiotis Papadakis and Ehsan Abbasnejad and Gilles Coppin and Javen Shi},
title={Improving Reward Estimation in Goal-Conditioned Imitation Learning with Counterfactual Data and Structural Causal Models},
booktitle={Proceedings of the 20th International Conference on Informatics in Control, Automation and Robotics - Volume 2: LICARSA},
year={2023},
pages={329-337},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012268200003543},
isbn={978-989-758-670-5},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 20th International Conference on Informatics in Control, Automation and Robotics - Volume 2: LICARSA
TI - Improving Reward Estimation in Goal-Conditioned Imitation Learning with Counterfactual Data and Structural Causal Models
SN - 978-989-758-670-5
AU - Jabri M.
AU - Papadakis P.
AU - Abbasnejad E.
AU - Coppin G.
AU - Shi J.
PY - 2023
SP - 329
EP - 337
DO - 10.5220/0012268200003543
PB - SciTePress