no code implementations • 18 May 2023 • Duksang Lee, William Overman, Dabeen Lee
For the observe-then-decide regime, we prove that the expected regret against the dynamic clairvoyant optimal policy is bounded by $\tilde O(\rho^{-1}{H^{3/2}}S\sqrt{AT})$ where $\rho\in(0, 1)$ is the budget parameter, $H$ is the length of the horizon, $S$ and $A$ are the numbers of states and actions, and $T$ is the number of episodes.
no code implementations • 2 May 2023 • Duksang Lee, Nam Ho-Nguyen, Dabeen Lee
This paper develops projection-free algorithms for online convex optimization with stochastic constraints.