no code implementations • 7 Feb 2024 • Kihyuk Hong, Ambuj Tewari
Our algorithm is the first computationally efficient algorithm in this setting that achieves sample complexity of $O(\epsilon^{-2})$ with partial data coverage assumption.
no code implementations • 13 Jun 2023 • Kihyuk Hong, Yuhang Li, Ambuj Tewari
Offline constrained reinforcement learning (RL) aims to learn a policy that maximizes the expected cumulative reward subject to constraints on expected cumulative cost using an existing dataset.
no code implementations • 29 May 2022 • Kihyuk Hong, Yuhang Li, Ambuj Tewari
Moreover, when applied to the non-stationary linear bandit setting by using a linear kernel, our algorithm is nearly minimax optimal, solving an open problem in the non-stationary linear bandit literature.