2 code implementations • 15 Sep 2018 • Anton Orell Wiehe, Nil Stolt Ansó, Madalina M. Drugan, Marco A. Wiering
In this paper, a new offline actor-critic learning algorithm is introduced: Sampled Policy Gradient (SPG).
Q-Learning