Search Results for author: Gabriele Farina

Found 39 papers, 3 papers with code

Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence Beyond the Minty Property

no code implementations • 19 Dec 2023 • Ioannis Anagnostides, Ioannis Panageas, Gabriele Farina, Tuomas Sandholm

Policy gradient methods enjoy strong practical performance in numerous tasks in reinforcement learning.

Paper
Add Code

Regularized Conventions: Equilibrium Computation as a Model of Pragmatic Reasoning

no code implementations • 16 Nov 2023 • Athul Paul Jacob, Gabriele Farina, Jacob Andreas

We present a model of pragmatic language understanding, where utterances are produced and understood by searching for regularized equilibria of signaling games.

Implicatures

Paper
Add Code

Last-Iterate Convergence Properties of Regret-Matching Algorithms in Games

no code implementations • 1 Nov 2023 • Yang Cai, Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Weiqiang Zheng

Algorithms based on regret matching, specifically regret matching$^+$ (RM$^+$), and its variants are the most popular approaches for solving large-scale two-player zero-sum games in practice.

Paper
Add Code

The Consensus Game: Language Model Generation via Equilibrium Search

no code implementations • 13 Oct 2023 • Athul Paul Jacob, Yikang Shen, Gabriele Farina, Jacob Andreas

When applied to question answering and other text generation tasks, language models (LMs) may be queried generatively (by sampling answers from their output distribution) or discriminatively (by using them to score or rank a set of candidate outputs).

Language Modelling Question Answering +2

Paper
Add Code

The Update-Equivalence Framework for Decision-Time Planning

no code implementations • 25 Apr 2023 • Samuel Sokota, Gabriele Farina, David J. Wu, Hengyuan Hu, Kevin A. Wang, J. Zico Kolter, Noam Brown

Using this framework, we derive a provably sound search algorithm for fully cooperative games based on mirror descent and a search algorithm for adversarial games based on magnetic mirror descent.

Paper
Add Code

Human-level play in the game of Diplomacy by combining language models with strategic reasoning

1 code implementation • Science 2022 • Anton Bakhtin, Noam Brown, Emily Dinan, Gabriele Farina, Colin Flaherty, Daniel Fried, Andrew Goff, Jonathan Gray, Hengyan Hu, Athul Paul Jacob, Mojtaba Komeili, Karthik Konath, Minae Kwon, Adam Lerer, Mike Lewis, Alexander H. Miller, Sash Mitts, Aditya Renduchintala, Stephen Roller, Dirk Rowe, Weiyan Shi, Joe Spisak, Alexander Wei, David Wu, Hugh Zhang, Markus Zijlstra

Despite much progress in training AI systems to imitate human language, building agents that use language to communicate intentionally with humans in interactive environments remains a major challenge.

1,238

Paper
Code

Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning

1 code implementation • 11 Oct 2022 • Anton Bakhtin, David J Wu, Adam Lerer, Jonathan Gray, Athul Paul Jacob, Gabriele Farina, Alexander H Miller, Noam Brown

We then show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL that provides a model of human play while simultaneously training an agent that responds well to this human model.

reinforcement-learning Reinforcement Learning (RL)

1,238

Paper
Code

Near-Optimal $Φ$-Regret Learning in Extensive-Form Games

no code implementations • 20 Aug 2022 • Ioannis Anagnostides, Gabriele Farina, Tuomas Sandholm

In this paper, we establish efficient and uncoupled learning dynamics so that, when employed by all players in multiplayer perfect-recall imperfect-information extensive-form games, the trigger regret of each player grows as $O(\log T)$ after $T$ repetitions of play.

Open-Ended Question Answering

Paper
Add Code

Near-Optimal No-Regret Learning Dynamics for General Convex Games

no code implementations • 17 Jun 2022 • Gabriele Farina, Ioannis Anagnostides, Haipeng Luo, Chung-Wei Lee, Christian Kroer, Tuomas Sandholm

In this paper, we answer this in the positive by establishing the first uncoupled learning algorithm with $O(\log T)$ per-player regret in general \emph{convex games}, that is, games with concave utility functions supported on arbitrary convex and compact strategy sets.

Paper
Add Code

ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret

1 code implementation • 8 Jun 2022 • Stephen Mcaleer, Gabriele Farina, Marc Lanctot, Tuomas Sandholm

DREAM, the only current CFR-based neural method that is model free and therefore scalable to very large games, trains a neural network on an estimated regret target that can have extremely high variance due to an importance sampling term inherited from Monte Carlo CFR (MCCFR).

counterfactual

Paper
Code

Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer Games

no code implementations • 25 Apr 2022 • Ioannis Anagnostides, Gabriele Farina, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Tuomas Sandholm

In this paper we establish efficient and \emph{uncoupled} learning dynamics so that, when employed by all players in a general-sum multiplayer game, the \emph{swap regret} of each player after $T$ repetitions of the game is bounded by $O(\log T)$, improving over the prior best bounds of $O(\log^4 (T))$.

Paper
Add Code

Optimal Correlated Equilibria in General-Sum Extensive-Form Games: Fixed-Parameter Algorithms, Hardness, and Two-Sided Column-Generation

no code implementations • 14 Mar 2022 • Brian Zhang, Gabriele Farina, Andrea Celli, Tuomas Sandholm

For team games, the two-sided column generation approach vastly outperforms standard column generation approaches, making it the state of the art algorithm when the parameter is large.

Paper
Add Code

Kernelized Multiplicative Weights for 0/1-Polyhedral Games: Bridging the Gap Between Learning in Extensive-Form and Normal-Form Games

no code implementations • 1 Feb 2022 • Gabriele Farina, Chung-Wei Lee, Haipeng Luo, Christian Kroer

In this paper we show that the Optimistic Multiplicative Weights Update (OMWU) algorithm -- the premier learning algorithm for NFGs -- can be simulated on the normal-form equivalent of an EFG in linear time per iteration in the game tree size using a kernel trick.

Paper
Add Code

Modeling Strong and Human-Like Gameplay with KL-Regularized Search

no code implementations • 14 Dec 2021 • Athul Paul Jacob, David J. Wu, Gabriele Farina, Adam Lerer, Hengyuan Hu, Anton Bakhtin, Jacob Andreas, Noam Brown

We consider the task of building strong but human-like policies in multi-agent decision-making problems, given examples of human behavior.

Imitation Learning

Paper
Add Code

Equilibrium Refinement for the Age of Machines: The One-Sided Quasi-Perfect Equilibrium

no code implementations • NeurIPS 2021 • Gabriele Farina, Tuomas Sandholm

In this paper, we initiate the study of equilibrium refinements for settings where one of the players is perfectly rational (the ``machine'') and the other may make mistakes.

Paper
Add Code

Near-Optimal No-Regret Learning for Correlated Equilibria in Multi-Player General-Sum Games

no code implementations • 11 Nov 2021 • Ioannis Anagnostides, Constantinos Daskalakis, Gabriele Farina, Maxwell Fishelson, Noah Golowich, Tuomas Sandholm

Recently, Daskalakis, Fishelson, and Golowich (DFG) (NeurIPS`21) showed that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(\textrm{polylog}(T))$ after $T$ repetitions of the game.

Paper
Add Code

Faster No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

no code implementations • 29 Sep 2021 • Ioannis Anagnostides, Gabriele Farina, Christian Kroer, Tuomas Sandholm

A recent emerging trend in the literature on learning in games has been concerned with providing accelerated learning dynamics for correlated and coarse correlated equilibria in normal-form games.

Paper
Add Code

Better Regularization for Sequential Decision Spaces: Fast Convergence Rates for Nash, Correlated, and Team Equilibria

no code implementations • 27 May 2021 • Gabriele Farina, Christian Kroer, Tuomas Sandholm

The scaled extension operator is a way to recursively construct convex sets, which generalizes the decision polytope of extensive-form games, as well as the convex polytopes corresponding to correlated and team equilibria.

Paper
Add Code

Simple Uncoupled No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

no code implementations • 4 Apr 2021 • Gabriele Farina, Andrea Celli, Alberto Marchesi, Nicola Gatti

The existence of simple uncoupled no-regret learning dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems.

Paper
Add Code

Model-Free Online Learning in Unknown Sequential Decision Making Problems and Games

no code implementations • 8 Mar 2021 • Gabriele Farina, Tuomas Sandholm

We give an efficient algorithm that achieves $O(T^{3/4})$ regret with high probability for that setting, even when the agent faces an adversarial environment.

counterfactual Decision Making

Paper
Add Code

Bandit Linear Optimization for Sequential Decision Making and Extensive-Form Games

no code implementations • 8 Mar 2021 • Gabriele Farina, Robin Schmucker, Tuomas Sandholm

Tree-form sequential decision making (TFSDM) extends classical one-shot decision making by modeling tree-form interactions between an agent and a potentially adversarial environment.

counterfactual Decision Making

Paper
Add Code

Faster Algorithms for Optimal Ex-Ante Coordinated Collusive Strategies in Extensive-Form Zero-Sum Games

no code implementations • 21 Sep 2020 • Gabriele Farina, Andrea Celli, Nicola Gatti, Tuomas Sandholm

Second, we provide an algorithm that computes such an optimal distribution by only using profiles where only one of the team members gets to randomize in each profile.

Paper
Add Code

Polynomial-Time Computation of Optimal Correlated Equilibria in Two-Player Extensive-Form Games with Public Chance Moves and Beyond

no code implementations • NeurIPS 2020 • Gabriele Farina, Tuomas Sandholm

As of today, it is known that finding an optimal extensive-form correlated equilibrium (EFCE), extensive-form coarse correlated equilibrium (EFCCE), or normal-form coarse correlated equilibrium (NFCCE) in a two-player extensive-form game is computationally tractable when the game does not include chance moves, and intractable when the game involves chance moves.

Paper
Add Code

Faster Game Solving via Predictive Blackwell Approachability: Connecting Regret Matching and Mirror Descent

no code implementations • 28 Jul 2020 • Gabriele Farina, Christian Kroer, Tuomas Sandholm

In spite of this prevalence, the regret matching (RM) and regret matching+ (RM+) algorithms have been preferred in the practice of solving large-scale games (as the local regret minimizers within the counterfactual regret minimization framework).

counterfactual

Paper
Add Code

No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

no code implementations • NeurIPS 2020 • Andrea Celli, Alberto Marchesi, Gabriele Farina, Nicola Gatti

When each player has low trigger regret, the empirical frequency of play is close to an EFCE.

Paper
Add Code

Stochastic Regret Minimization in Extensive-Form Games

no code implementations • ICML 2020 • Gabriele Farina, Christian Kroer, Tuomas Sandholm

Our framework allows us to instantiate several new stochastic methods for solving sequential games.

counterfactual

Paper
Add Code

Efficient Regret Minimization Algorithm for Extensive-Form Correlated Equilibrium

no code implementations • NeurIPS 2019 • Gabriele Farina, Chun Kai Ling, Fei Fang, Tuomas Sandholm

We show that a regret minimizer can be designed for a scaled extension of any two convex sets, and that from the decomposition we then obtain a global regret minimizer.

Paper
Add Code

Optimistic Regret Minimization for Extensive-Form Games via Dilated Distance-Generating Functions

no code implementations • NeurIPS 2019 • Gabriele Farina, Christian Kroer, Tuomas Sandholm

Our algorithms provably converge at a rate of $T^{-1}$, which is superior to prior counterfactual regret minimization algorithms.

counterfactual

Paper
Add Code

Coarse Correlation in Extensive-Form Games

no code implementations • 26 Aug 2019 • Gabriele Farina, Tommaso Bianchi, Tuomas Sandholm

Coarse correlation models strategic interactions of rational agents complemented by a correlation device, that is a mediator that can recommend behavior but not enforce it.

Paper
Add Code

Stable-Predictive Optimistic Counterfactual Regret Minimization

no code implementations • 13 Feb 2019 • Gabriele Farina, Christian Kroer, Noam Brown, Tuomas Sandholm

The CFR framework has been a powerful tool for solving large-scale extensive-form games in practice.

counterfactual

Paper
Add Code

Practical exact algorithm for trembling-hand equilibrium refinements in games

no code implementations • NeurIPS 2018 • Gabriele Farina, Nicola Gatti, Tuomas Sandholm

Nash equilibrium strategies have the known weakness that they do not prescribe rational play in situations that are reached with zero probability according to the strategies themselves, for example, if players have made mistakes.

Paper
Add Code

Ex ante coordination and collusion in zero-sum multi-player extensive-form games

no code implementations • NeurIPS 2018 • Gabriele Farina, Andrea Celli, Nicola Gatti, Tuomas Sandholm

This paper focuses on zero-sum games where a team of players faces an opponent, as is the case, for example, in Bridge, collusion in poker, and many non-recreational applications such as war, where the colluders do not have time or means of communicating during battle, collusion in bidding, where communication during the auction is illegal, and coordinated swindling in public.

Paper
Add Code

Regret Circuits: Composability of Regret Minimizers

no code implementations • 6 Nov 2018 • Gabriele Farina, Christian Kroer, Tuomas Sandholm

We show that local regret minimizers for the simpler sets can be combined with additional regret minimizers into an aggregate regret minimizer for the composite set.

Paper
Add Code

Solving Large Sequential Games with the Excessive Gap Technique

no code implementations • NeurIPS 2018 • Christian Kroer, Gabriele Farina, Tuomas Sandholm

We present, to our knowledge, the first GPU implementation of a first-order method for extensive-form games.

counterfactual

Paper
Add Code

Online Convex Optimization for Sequential Decision Processes and Extensive-Form Games

no code implementations • 10 Sep 2018 • Gabriele Farina, Christian Kroer, Tuomas Sandholm

Experiments show that our framework leads to algorithms that scale at a rate comparable to the fastest variants of counterfactual regret minimization for computing Nash equilibrium, and therefore our approach leads to the first algorithm for computing quantal response equilibria in extremely large games.

counterfactual Decision Making

Paper
Add Code

Robust Stackelberg Equilibria in Extensive-Form Games and Extension to Limited Lookahead

no code implementations • 21 Nov 2017 • Christian Kroer, Gabriele Farina, Tuomas Sandholm

We then extend the program to the robust setting for Stackelberg equilibrium under unlimited and under limited lookahead by the opponent.

Paper
Add Code

Regret Minimization in Behaviorally-Constrained Zero-Sum Games

no code implementations • ICML 2017 • Gabriele Farina, Christian Kroer, Tuomas Sandholm

We use an instantiation of the CFR framework to develop algorithms for solving behaviorally-constrained (and, as a special case, perturbed in the Selten sense) extensive-form games, which allows us to compute approximate Nash equilibrium refinements.

counterfactual

Paper
Add Code

Operation Frames and Clubs in Kidney Exchange

no code implementations • 25 May 2017 • Gabriele Farina, John P. Dickerson, Tuomas Sandholm

A kidney exchange is a centrally-administered barter market where patients swap their willing yet incompatible donors.

Paper
Add Code

Decoding Hidden Markov Models Faster Than Viterbi Via Online Matrix-Vector (max, +)-Multiplication

no code implementations • 30 Nov 2015 • Massimo Cairo, Gabriele Farina, Romeo Rizzi

In this paper, we present a novel algorithm for the maximum a posteriori decoding (MAPD) of time-homogeneous Hidden Markov Models (HMM), improving the worst-case running time of the classical Viterbi algorithm by a logarithmic factor.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.