no code implementations • 4 Apr 2024 • Jiacai Liu, Wenye Li, Ke Wei
Projected policy gradient under the simplex parameterization, policy gradient and natural policy gradient under the softmax parameterization, are fundamental algorithms in reinforcement learning.
no code implementations • 31 May 2023 • Jiacai Liu, Jinchi Chen, Ke Wei
To show the local linear convergence of the algorithm, we have indeed established the contraction of the sub-optimal probability $b_s^k$ (i. e., the probability of the output policy $\pi^k$ on non-optimal actions) when $k\ge k_0$.