no code implementations • 13 Jan 2023 • Romain Cravic, Nicolas Gast, Bruno Gaujal
We propose the first model-free algorithm that achieves low regret performance for decentralized learning in two-player zero-sum tabular stochastic games with infinite-horizon average-reward objective.