1 code implementation • 8 Jun 2016 • Tejas D. Kulkarni, Ardavan Saeedi, Simanta Gautam, Samuel J. Gershman
The successor map represents the expected future state occupancy from any given state and the reward predictor maps states to scalar rewards.
Game of Doom reinforcement-learning +1