no code implementations • 4 Mar 2024 • Théo Vincent, Daniel Palenicek, Boris Belousov, Jan Peters, Carlo D'Eramo
Value-based Reinforcement Learning (RL) methods rely on the application of the Bellman operator, which needs to be approximated from samples.
1 code implementation • 20 Dec 2023 • Théo Vincent, Alberto Maria Metelli, Boris Belousov, Jan Peters, Marcello Restelli, Carlo D'Eramo
We formulate an optimization problem to learn PBO for generic sequential decision-making problems, and we theoretically analyze its properties in two representative classes of RL problems.