A Reduction Approach to Constrained Reinforcement Learning

1 Jan 2021 · Tianchi Cai, Wenjie Shi, Lihong Gu, Xiaodong Zeng, Jinjie Gu ·

Many applications of reinforcement learning (RL) optimize a long-term reward subject to risk, safety, budget, diversity or other constraints. Though constrained RL problem has been studied to incorporate various constraints, existing methods require randomization among infinitely many policies to approach a feasible solution, which is impractical. In this paper, we present a reduction approach to find sparse policies that randomize among a constant number of policies for the constrained RL problem. The key idea is to reduce the constrained RL problem to a distance minimization problem, and a novel variant of Frank-Wolfe type algorithm is proposed for this task. Throughout the learning process, our method maintains a sparse combination of individual policies, and we show that the number of policies stored is worst-case optimal to work with any RL algorithm. Using any off-the-shelf RL algorithm as an oracle, our method is shown to strictly reduce the approximation error between each call of the oracle, and improvement in convergence rate over previous results is presented. Experiments on a grid-world navigation task demonstrate our method stores less policies and outperforms previous methods.

PDF Abstract