1 code implementation • 5 Mar 2024 • Cassidy Laidlaw, Shivam Singhal, Anca Dragan
Thus, we propose regularizing based on the OM divergence between policies instead of AD divergence to prevent reward hacking.
no code implementations • 3 Aug 2021 • Ryan Rowe, Shivam Singhal, Daqing Yi, Tapomayukh Bhattacharjee, Siddhartha S. Srinivasa
We examine the problem of desk organization: learning how humans spatially position different objects on a planar surface according to organizational ''preference''.