no code implementations • 5 Nov 2023 • Siqiao Mu, Diego Klabjan
Since the objective functions of reinforcement learning problems are typically highly nonconvex, it is desirable that policy gradient, the most popular algorithm, escapes saddle points and arrives at second-order stationary points.