no code implementations • 7 Mar 2024 • Long-Fei Li, Peng Zhao, Zhi-Hua Zhou
We study reinforcement learning with linear function approximation, unknown transition, and adversarial losses in the bandit feedback setting.
no code implementations • 26 Aug 2022 • Peng Zhao, Long-Fei Li, Zhi-Hua Zhou
For these three models, we propose novel online ensemble algorithms and establish their dynamic regret guarantees respectively, in which the results for episodic (loop-free) SSP are provably minimax optimal in terms of time horizon and certain non-stationarity measure.