1 code implementation • 31 May 2023 • Geraud Nangue Tasse, Tamlin Love, Mark Nemecek, Steven James, Benjamin Rosman
A common solution is for a human expert to define either a penalty in the reward function or a cost to be minimised when reaching unsafe states.