no code implementations • 7 Feb 2024 • Carlo Alfano, Sebastian Towers, Silvia Sapora, Chris Lu, Patrick Rebeschini
Policy Mirror Descent (PMD) is a popular framework in reinforcement learning, serving as a unifying perspective that encompasses numerous algorithms.
1 code implementation • NeurIPS 2023 • Carlo Alfano, Rui Yuan, Patrick Rebeschini
Modern policy optimization methods in reinforcement learning, such as TRPO and PPO, owe their success to the use of parameterized policies.
no code implementations • 30 Sep 2022 • Carlo Alfano, Patrick Rebeschini
We analyze the convergence rate of the unregularized natural policy gradient algorithm with log-linear policy parametrizations in infinite-horizon discounted Markov decision processes.
no code implementations • 23 Sep 2021 • Carlo Alfano, Patrick Rebeschini
Cooperative multi-agent reinforcement learning is a decentralized paradigm in sequential decision making where agents distributed over a network iteratively collaborate with neighbors to maximize global (network-wide) notions of rewards.