2 code implementations • 24 Nov 2020 • Soham Gadgil, Yunfeng Xin, Chengzhe Xu
With our best models, we are able to achieve average rewards of 170+ with the Sarsa agent and 200+ with the Deep Q-Learning agent on the original problem.
Navigate Q-Learning +2