1 code implementation • 11 Oct 2018 • Navneet Madhu Kumar
However, many of the state of the art deep reinforcement learning algorithms, that rely on epsilon-greedy, fail on these environments.
Montezuma's Revenge Mutual Information Estimation +2