Towards Unknown-aware Deep Q-Learning

29 Sep 2021 · Ying Fan, Sharon Li ·

Deep reinforcement learning (RL) has achieved remarkable success in known environments where the agents are trained, yet the agents do not necessarily know what they don’t know. In particular, RL agents deployed in the open world are naturally subject to environmental shifts and encounter unknown out-of-distribution (OOD) states---i.e., states from outside the training environment. Currently, the study of handling OOD states in the RL environment remains underexplored. This paper bridges this critical gap by proposing and exploring an unknown-aware RL framework, which improves the safety and reliability of deep Q-learning. Our key idea is to regularize the training of Q-learning so that OOD states will have higher OOD uncertainty, while in-distribution states will have lower OOD uncertainty; therefore making them distinguishable. This is in contrast with vanilla Q-learning which does not take into account unknowns during training. Furthermore, we provide theoretical guarantees that our method can improve OOD uncertainty estimation while ensuring the convergence performance of the in-distribution environment. Empirically, we demonstrate state-of-the-art performance on six diverse environments, achieving near-optimal OOD detection performance.

PDF Abstract