no code implementations • 10 May 2017 • Yan Li, Zhaohan Sun
In most common settings of Markov Decision Process (MDP), an agent evaluate a policy based on expectation of (discounted) sum of rewards.
Autonomous Driving