Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning

22 Apr 2020 Shangtong Zhang Bo Liu Shimon Whiteson

We present a mean-variance policy iteration (MVPI) framework for risk-averse control in a discounted infinite horizon MDP optimizing the variance of a per-step reward random variable. MVPI enjoys great flexibility in that any policy evaluation method and risk-neutral control method can be dropped in for risk-averse control off the shelf, in both on- and off-policy settings... (read more)

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
Target Policy Smoothing
Regularization
Adam
Stochastic Optimization
ReLU
Activation Functions
Experience Replay
Replay Memory
Dense Connections
Feedforward Networks
Clipped Double Q-learning
Off-Policy TD Control
TD3
Policy Gradient Methods
Entropy Regularization
Regularization
PPO
Policy Gradient Methods