no code implementations • 3 May 2024 • Alessandro Montenegro, Marco Mussi, Alberto Maria Metelli, Matteo Papini
After introducing a novel framework for modeling this scenario, we study the global convergence to the best deterministic policy, under (weak) gradient domination assumptions.
1 code implementation • 15 Feb 2023 • Marco Mussi, Alessandro Montenegro, Francesco Trovó, Marcello Restelli, Alberto Maria Metelli
Then, we prove that, with a sufficiently large budget, they provide guarantees on the probability of properly identifying the optimal option at the end of the learning process.