Guided Exploration in Deep Reinforcement Learning

27 Sep 2018 · Sahisnu Mazumder, Bing Liu, Shuai Wang, Yingxuan Zhu, Xiaotian Yin, Lifeng Liu, Jian Li, Yongbing Huang ·

This paper proposes a new method to drastically speed up deep reinforcement learning (deep RL) training for problems that have the property of \textit{state-action permissibility} (SAP). Two types of permissibility are defined under SAP. The first type says that after an action $a_t$ is performed in a state $s_t$ and the agent reaches the new state $s_{t+1}$, the agent can decide whether the action $a_t$ is \textit{permissible} or \textit{not permissible} in state $s_t$. The second type says that even without performing the action $a_t$ in state $s_t$, the agent can already decide whether $a_t$ is permissible or not in $s_t$. An action is not permissible in a state if the action can never lead to an optimal solution and thus should not be tried. We incorporate the proposed SAP property into two state-of-the-art deep RL algorithms to guide their state-action exploration. Results show that the SAP guidance can markedly speed up training.

PDF Abstract