no code implementations • 23 Mar 2024 • Abhijit Mazumdar, Rafal Wisniewski, Manuela L. Bujorianu
In this paper, we present an online reinforcement learning algorithm for constrained Markov decision processes with a safety constraint.
no code implementations • 8 Dec 2023 • Abhijit Mazumdar, Rafal Wisniewski, Manuela L. Bujorianu
We then use an off-policy temporal difference learning method with importance sampling to learn the safety function corresponding to the given policy.