Regret Guarantees for Online Receding Horizon Learning Control
In this paper we provide provable regret guarantees for an online learning receding horizon type control policy in a setting where the system to be controlled is an unknown linear dynamical system, the cost for the controller is a general additive function over a finite period $T$, and there exist control input constraints that when violated incur an additional cost. We show that the learning based receding horizon control policy achieves the regret of $\tilde{O}(T^{3/4})$ for both the controller's cost and cumulative constraint violation w.r.t the baseline receding horizon control policy that has full knowledge of the system.
PDF Abstract