Risk-averse Distributional Reinforcement Learning: A CVaR Optimization Approach

Silvestr Stanko; Karel Macek

Research.Publish.Connect.

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Risk-averse Distributional Reinforcement Learning: A CVaR Optimization Approach

Topics: Deep Learning; Reinforcement Learning

In Proceedings of the 11th International Joint Conference on Computational Intelligence - Volume 1: NCTA, 412-423, 2019 , Vienna, Austria

Authors: Silvestr Stanko and Karel Macek

Affiliation: DHL ITS Digital Lab and Czech Republic

Keyword(s): Reinforcement Learning, Distributional Reinforcement Learning, Risk, AI Safety, Conditional Value-at-Risk, CVaR, Value Iteration, Q-learning, Deep Learning, Deep Q-learning.

Related Ontology Subjects/Areas/Topics: Pattern Recognition ; Reinforcement Learning ; Theory and Methods

Abstract: Conditional Value-at-Risk (CVaR) is a well-known measure of risk that has been directly equated to robustness, an important component of Artificial Intelligence (AI) safety. In this paper we focus on optimizing CVaR in the context of Reinforcement Learning (RL), as opposed to the usual risk-neutral expectation. As a first original contribution, we improve the CVaR Value Iteration algorithm (Chow et al., 2015) in a way that reduces computational complexity of the original algorithm from polynomial to linear time. Secondly, we propose a sampling version of CVaR Value Iteration we call CVaR Q-learning. We also derive a distributional policy improvement algorithm, and later use it as a heuristic for extracting the optimal policy from the converged CVaR Q-learning algorithm. Finally, to show the scalability of our method, we propose an approximate Q-learning algorithm by reformulating the CVaR Temporal Difference update rule as a loss function which we later use in a deep learning context . All proposed methods are experimentally analyzed, including the Deep CVaR Q-learning agent which learns how to avoid risk from raw pixels. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 3.15.5.183

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Stanko, S. and Macek, K. (2019). Risk-averse Distributional Reinforcement Learning: A CVaR Optimization Approach. In Proceedings of the 11th International Joint Conference on Computational Intelligence (IJCCI 2019) - NCTA; ISBN 978-989-758-384-1; ISSN 2184-3236, SciTePress, pages 412-423. DOI: 10.5220/0008175604120423

@conference{ncta19,
author={Silvestr Stanko. and Karel Macek.},
title={Risk-averse Distributional Reinforcement Learning: A CVaR Optimization Approach},
booktitle={Proceedings of the 11th International Joint Conference on Computational Intelligence (IJCCI 2019) - NCTA},
year={2019},
pages={412-423},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0008175604120423},
isbn={978-989-758-384-1},
issn={2184-3236},
}

TY - CONF

JO - Proceedings of the 11th International Joint Conference on Computational Intelligence (IJCCI 2019) - NCTA
TI - Risk-averse Distributional Reinforcement Learning: A CVaR Optimization Approach
SN - 978-989-758-384-1
IS - 2184-3236
AU - Stanko, S.
AU - Macek, K.
PY - 2019
SP - 412
EP - 423
DO - 10.5220/0008175604120423
PB - SciTePress