no code implementations • 14 Apr 2024 • Simon Eisenmann, Daniel Hein, Steffen Udluft, Thomas A. Runkler
The policy is optimized with a gradient-free optimization scheme using the return estimate given by the model as the fitness function.
reinforcement-learning