no code implementations • 10 Jun 2023 • Kexuan Wang, An Liu, Baishuo Liu
In spite of the biased policy gradient estimation incurred by the single-loop design and observation reuse, we prove that the SLDAC with a feasible initial point can converge to a Karush-Kuhn-Tuker (KKT) point of the original problem almost surely.