Solipsistic Reinforcement Learning
We introduce a new model-based reinforcement learning framework that aims to tackle environments with high dimensional state spaces. In contrast to existing approaches, agents under our framework learn a low dimensional internal representation of the environment while avoiding the need to learn a generative model of the environment itself. This solipsistic representation is trained to encode a belief that is consistent with the dynamics of the environment and is then exploited for effective planning. We present specific cases of our framework with choices of model and corresponding planning algorithms that can deal with both discrete and continuous state environments. We demonstrate empirically gains in efficiency over existing model-free methods when learning directly from pixels and analyze the properties of our learned representations.
PDF Abstract