no code implementations • 5 May 2024 • Siow Meng Low, Akshat Kumar
This safety model is trained using a labeled safety dataset.
1 code implementation • 6 Apr 2023 • Siow Meng Low, Akshat Kumar, Scott Sanner
In safe MDP planning, a cost function based on the current state and action is often used to specify safety aspects.
no code implementations • 23 Mar 2022 • Siow Meng Low, Akshat Kumar, Scott Sanner
This novel formulation of DRP learning as iterative lower bound optimization (ILBO) is particularly appealing because (i) each step is structurally easier to optimize than the overall objective, (ii) it guarantees a monotonically improving objective under certain theoretical conditions, and (iii) it reuses samples between iterations thus lowering sample complexity.