Resolving Causal Confusion in Reinforcement Learning via Robust Exploration

A reinforcement learning agent must distinguish between spurious correlations and causal relationships in its environment in order to robustly achieve its goals. Contrary to popular belief, such cases of causal confusion {\em can} occur in online reinforcement learning (RL) settings. We demonstrate this, and show how causal confusion can lead to catastrophic failure under even mild forms of distribution shift. We formalize the problem of identifying causal structure in a Markov Decision Process, and highlight the central role played by the data collection policy in identifying and avoiding spurious correlations. We find that under insufficient exploration, many RL algorithms, including those with PAC-MDP guarantees, fall prey to causal confusion under insufficient exploration policies. To address this, we present a robust exploration strategy which enables causal hypothesis-testing by interaction with the environment. Our method outperforms existing state-of-the-art approaches at avoiding causal confusion, improving robustness and generalization in a range of tasks.

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here