no code implementations • 25 Apr 2023 • Samuel Sokota, Gabriele Farina, David J. Wu, Hengyuan Hu, Kevin A. Wang, J. Zico Kolter, Noam Brown
Using this framework, we derive a provably sound search algorithm for fully cooperative games based on mirror descent and a search algorithm for adversarial games based on magnetic mirror descent.
no code implementations • 22 Jan 2023 • Samuel Sokota, Ryan D'Orazio, Chun Kai Ling, David J. Wu, J. Zico Kolter, Noam Brown
Because these regularized equilibria can be made arbitrarily close to Nash equilibria, our result opens the door to a new perspective to solving two-player zero-sum games and yields a simplified framework for decision-time planning in two-player zero-sum games, void of the unappealing properties that plague existing decision-time planning approaches.
1 code implementation • Science 2022 • Anton Bakhtin, Noam Brown, Emily Dinan, Gabriele Farina, Colin Flaherty, Daniel Fried, Andrew Goff, Jonathan Gray, Hengyan Hu, Athul Paul Jacob, Mojtaba Komeili, Karthik Konath, Minae Kwon, Adam Lerer, Mike Lewis, Alexander H. Miller, Sash Mitts, Aditya Renduchintala, Stephen Roller, Dirk Rowe, Weiyan Shi, Joe Spisak, Alexander Wei, David Wu, Hugh Zhang, Markus Zijlstra
Despite much progress in training AI systems to imitate human language, building agents that use language to communicate intentionally with humans in interactive environments remains a major challenge.
no code implementations • 11 Oct 2022 • Hengyuan Hu, David J Wu, Adam Lerer, Jakob Foerster, Noam Brown
First, we show that our method outperforms experts when playing with a group of diverse human players in ad-hoc teams.
1 code implementation • 11 Oct 2022 • Anton Bakhtin, David J Wu, Adam Lerer, Jonathan Gray, Athul Paul Jacob, Gabriele Farina, Alexander H Miller, Noam Brown
We then show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL that provides a model of human play while simultaneously training an agent that responds well to this human model.
3 code implementations • 12 Jun 2022 • Samuel Sokota, Ryan D'Orazio, J. Zico Kolter, Nicolas Loizou, Marc Lanctot, Ioannis Mitliagkas, Noam Brown, Christian Kroer
This work studies an algorithm, which we call magnetic mirror descent, that is inspired by mirror descent and the non-Euclidean proximal gradient algorithm.
no code implementations • 14 Dec 2021 • Athul Paul Jacob, David J. Wu, Gabriele Farina, Adam Lerer, Hengyuan Hu, Anton Bakhtin, Jacob Andreas, Noam Brown
We consider the task of building strong but human-like policies in multi-agent decision-making problems, given examples of human behavior.
1 code implementation • NeurIPS 2021 • Anton Bakhtin, David Wu, Adam Lerer, Noam Brown
Additionally, we extend our methods to full-scale no-press Diplomacy and for the first time train an agent from scratch with no human data.
1 code implementation • NeurIPS 2021 • Arnaud Fickinger, Hengyuan Hu, Brandon Amos, Stuart Russell, Noam Brown
Lookahead search has been a critical component of recent AI successes, such as in the games of chess, go, and poker.
no code implementations • ICLR 2022 • Samuel Sokota, Hengyuan Hu, David J Wu, J Zico Kolter, Jakob Nicolaus Foerster, Noam Brown
Furthermore, because this specialization occurs after the action or policy has already been decided, BFT does not require the belief model to process it as input.
no code implementations • 16 Jun 2021 • Hengyuan Hu, Adam Lerer, Noam Brown, Jakob Foerster
Search is an important tool for computing effective policies in single- and multi-agent environments, and has been crucial for achieving superhuman performance in several benchmark fully and partially observable games.
5 code implementations • 6 Mar 2021 • Hengyuan Hu, Adam Lerer, Brandon Cui, David Wu, Luis Pineda, Noam Brown, Jakob Foerster
Policies learned through self-play may adopt arbitrary conventions and implicitly rely on multi-step reasoning based on fragile assumptions about other agents' actions and thus fail when paired with humans or independently trained agents at test time.
no code implementations • 2 Feb 2021 • Chun Kai Ling, Noam Brown
Stackelberg equilibrium is a solution concept in two-player games where the leader has commitment rights over the follower.
no code implementations • ICLR 2021 • Jonathan Gray, Adam Lerer, Anton Bakhtin, Noam Brown
Prior AI breakthroughs in complex games have focused on either the purely adversarial or purely cooperative settings.
1 code implementation • NeurIPS 2020 • Noam Brown, Anton Bakhtin, Adam Lerer, Qucheng Gong
This paper presents ReBeL, a general framework for self-play reinforcement learning and search that provably converges to a Nash equilibrium in any two-player zero-sum game.
no code implementations • 20 Jul 2020 • Ryan Zarick, Bryan Pellegrino, Noam Brown, Caleb Banister
Deep counterfactual value networks combined with continual resolving provide a way to conduct depth-limited search in imperfect-information games.
1 code implementation • 18 Jun 2020 • Eric Steinberger, Adam Lerer, Noam Brown
We introduce DREAM, a deep reinforcement learning algorithm that finds optimal strategies in imperfect-information games with multiple agents.
10 code implementations • 5 Dec 2019 • Adam Lerer, Hengyuan Hu, Jakob Foerster, Noam Brown
The first one, single-agent search, effectively converts the problem into a single agent setting by making all but one of the agents play according to the agreed-upon policy.
no code implementations • 13 Feb 2019 • Gabriele Farina, Christian Kroer, Noam Brown, Tuomas Sandholm
The CFR framework has been a powerful tool for solving large-scale extensive-form games in practice.
4 code implementations • 1 Nov 2018 • Noam Brown, Adam Lerer, Sam Gross, Tuomas Sandholm
This paper introduces Deep Counterfactual Regret Minimization, a form of CFR that obviates the need for abstraction by instead using deep neural networks to approximate the behavior of CFR in the full game.
3 code implementations • 11 Sep 2018 • Noam Brown, Tuomas Sandholm
Counterfactual regret minimization (CFR) is a family of iterative algorithms that are the most popular and, in practice, fastest approach to approximately solving large imperfect-information games.
no code implementations • NeurIPS 2018 • Noam Brown, Tuomas Sandholm, Brandon Amos
This paper introduces a principled way to conduct depth-limited solving in imperfect-information games by allowing the opponent to choose among a number of strategies for the remainder of the game at the depth limit.
no code implementations • ICML 2017 • Noam Brown, Tuomas Sandholm
Iterative algorithms such as Counterfactual Regret Minimization (CFR) are the most popular way to solve large zero-sum imperfect-information games.
no code implementations • NeurIPS 2017 • Noam Brown, Tuomas Sandholm
Thus a subgame cannot be solved in isolation and must instead consider the strategy for the entire game as a whole, unlike perfect-information games.
no code implementations • ICML 2017 • Noam Brown, Tuomas Sandholm
Counterfactual Regret Minimization (CFR) is the most popular iterative algorithm for solving zero-sum imperfect-information games.
no code implementations • NeurIPS 2015 • Noam Brown, Tuomas Sandholm
CFR is an iterative algorithm that repeatedly traverses the game tree, updating regrets at each information set. We introduce an improvement to CFR that prunes any path of play in the tree, and its descendants, that has negative regret.