no code implementations • 16 Jul 2021 • J. G. Dai, Mark Gluzman
The existing bound leads to a degenerate bound when the discount factor approaches one, making the applicability of TRPO and related algorithms questionable when the discount factor is close to one.
no code implementations • 12 Apr 2021 • Yihan Pan, Zhenghang Xu, Jin Guang, Jingjing Sun, Chengwenjian Wang, Xuanming Zhang, Xinyun Chen, J. G. Dai, Yichuan Ding, Pengyi Shi, Hongxin Pan, Kai Yang, Song Wu
To address the issue, we propose a novel two-level routing component to the queueing network model.
no code implementations • 31 Jul 2020 • J. G. Dai, Mark Gluzman
A key to the successes of our PPO algorithm is the use of three variance reduction techniques in estimating the relative value function via sampling.