Sample-Optimal Zero-Violation Safety For Continuous Control
In this paper, we study the problem of ensuring safety with a few shots of samples for partially unknown systems. We first characterize a fundamental limit when producing safe actions is not possible due to insufficient information or samples. Then, we develop a technique that can generate provably safe actions and recovery behaviors using a minimum number of samples. In the performance analysis, we also establish Nagumos theorem - like results with relaxed assumptions, which is potentially useful in other contexts. Finally, we discuss how the proposed method can be integrated into a policy gradient algorithm to assure safety and stability with a handful of samples without stabilizing initial policies or generative models to probe safe actions.
PDF Abstract