no code implementations • 20 Apr 2013 • Boris Lesner, Bruno Scherrer
For this algorithm we provide an error propagation analysis in the form of a performance bound of the resulting policies that can improve the usual performance bound by a factor $O(1-\gamma)$, which is significant when the discount factor $\gamma$ is close to 1.
no code implementations • NeurIPS 2012 • Bruno Scherrer, Boris Lesner
We consider infinite-horizon stationary $\gamma$-discounted Markov Decision Processes, for which it is known that there exists a stationary optimal policy.