Search Results for author: Gabriele Farina

Found 39 papers, 3 papers with code

Regularized Conventions: Equilibrium Computation as a Model of Pragmatic Reasoning

no code implementations16 Nov 2023 Athul Paul Jacob, Gabriele Farina, Jacob Andreas

We present a model of pragmatic language understanding, where utterances are produced and understood by searching for regularized equilibria of signaling games.

Implicatures

Last-Iterate Convergence Properties of Regret-Matching Algorithms in Games

no code implementations1 Nov 2023 Yang Cai, Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Weiqiang Zheng

Algorithms based on regret matching, specifically regret matching$^+$ (RM$^+$), and its variants are the most popular approaches for solving large-scale two-player zero-sum games in practice.

The Consensus Game: Language Model Generation via Equilibrium Search

no code implementations13 Oct 2023 Athul Paul Jacob, Yikang Shen, Gabriele Farina, Jacob Andreas

When applied to question answering and other text generation tasks, language models (LMs) may be queried generatively (by sampling answers from their output distribution) or discriminatively (by using them to score or rank a set of candidate outputs).

Language Modelling Question Answering +2

The Update-Equivalence Framework for Decision-Time Planning

no code implementations25 Apr 2023 Samuel Sokota, Gabriele Farina, David J. Wu, Hengyuan Hu, Kevin A. Wang, J. Zico Kolter, Noam Brown

Using this framework, we derive a provably sound search algorithm for fully cooperative games based on mirror descent and a search algorithm for adversarial games based on magnetic mirror descent.

Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning

1 code implementation11 Oct 2022 Anton Bakhtin, David J Wu, Adam Lerer, Jonathan Gray, Athul Paul Jacob, Gabriele Farina, Alexander H Miller, Noam Brown

We then show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL that provides a model of human play while simultaneously training an agent that responds well to this human model.

reinforcement-learning Reinforcement Learning (RL)

Near-Optimal $Φ$-Regret Learning in Extensive-Form Games

no code implementations20 Aug 2022 Ioannis Anagnostides, Gabriele Farina, Tuomas Sandholm

In this paper, we establish efficient and uncoupled learning dynamics so that, when employed by all players in multiplayer perfect-recall imperfect-information extensive-form games, the trigger regret of each player grows as $O(\log T)$ after $T$ repetitions of play.

Open-Ended Question Answering

Near-Optimal No-Regret Learning Dynamics for General Convex Games

no code implementations17 Jun 2022 Gabriele Farina, Ioannis Anagnostides, Haipeng Luo, Chung-Wei Lee, Christian Kroer, Tuomas Sandholm

In this paper, we answer this in the positive by establishing the first uncoupled learning algorithm with $O(\log T)$ per-player regret in general \emph{convex games}, that is, games with concave utility functions supported on arbitrary convex and compact strategy sets.

ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret

1 code implementation8 Jun 2022 Stephen Mcaleer, Gabriele Farina, Marc Lanctot, Tuomas Sandholm

DREAM, the only current CFR-based neural method that is model free and therefore scalable to very large games, trains a neural network on an estimated regret target that can have extremely high variance due to an importance sampling term inherited from Monte Carlo CFR (MCCFR).

counterfactual

Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer Games

no code implementations25 Apr 2022 Ioannis Anagnostides, Gabriele Farina, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Tuomas Sandholm

In this paper we establish efficient and \emph{uncoupled} learning dynamics so that, when employed by all players in a general-sum multiplayer game, the \emph{swap regret} of each player after $T$ repetitions of the game is bounded by $O(\log T)$, improving over the prior best bounds of $O(\log^4 (T))$.

Optimal Correlated Equilibria in General-Sum Extensive-Form Games: Fixed-Parameter Algorithms, Hardness, and Two-Sided Column-Generation

no code implementations14 Mar 2022 Brian Zhang, Gabriele Farina, Andrea Celli, Tuomas Sandholm

For team games, the two-sided column generation approach vastly outperforms standard column generation approaches, making it the state of the art algorithm when the parameter is large.

Kernelized Multiplicative Weights for 0/1-Polyhedral Games: Bridging the Gap Between Learning in Extensive-Form and Normal-Form Games

no code implementations1 Feb 2022 Gabriele Farina, Chung-Wei Lee, Haipeng Luo, Christian Kroer

In this paper we show that the Optimistic Multiplicative Weights Update (OMWU) algorithm -- the premier learning algorithm for NFGs -- can be simulated on the normal-form equivalent of an EFG in linear time per iteration in the game tree size using a kernel trick.

Modeling Strong and Human-Like Gameplay with KL-Regularized Search

no code implementations14 Dec 2021 Athul Paul Jacob, David J. Wu, Gabriele Farina, Adam Lerer, Hengyuan Hu, Anton Bakhtin, Jacob Andreas, Noam Brown

We consider the task of building strong but human-like policies in multi-agent decision-making problems, given examples of human behavior.

Imitation Learning

Equilibrium Refinement for the Age of Machines: The One-Sided Quasi-Perfect Equilibrium

no code implementations NeurIPS 2021 Gabriele Farina, Tuomas Sandholm

In this paper, we initiate the study of equilibrium refinements for settings where one of the players is perfectly rational (the ``machine'') and the other may make mistakes.

Near-Optimal No-Regret Learning for Correlated Equilibria in Multi-Player General-Sum Games

no code implementations11 Nov 2021 Ioannis Anagnostides, Constantinos Daskalakis, Gabriele Farina, Maxwell Fishelson, Noah Golowich, Tuomas Sandholm

Recently, Daskalakis, Fishelson, and Golowich (DFG) (NeurIPS`21) showed that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(\textrm{polylog}(T))$ after $T$ repetitions of the game.

Faster No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

no code implementations29 Sep 2021 Ioannis Anagnostides, Gabriele Farina, Christian Kroer, Tuomas Sandholm

A recent emerging trend in the literature on learning in games has been concerned with providing accelerated learning dynamics for correlated and coarse correlated equilibria in normal-form games.

Better Regularization for Sequential Decision Spaces: Fast Convergence Rates for Nash, Correlated, and Team Equilibria

no code implementations27 May 2021 Gabriele Farina, Christian Kroer, Tuomas Sandholm

The scaled extension operator is a way to recursively construct convex sets, which generalizes the decision polytope of extensive-form games, as well as the convex polytopes corresponding to correlated and team equilibria.

Simple Uncoupled No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

no code implementations4 Apr 2021 Gabriele Farina, Andrea Celli, Alberto Marchesi, Nicola Gatti

The existence of simple uncoupled no-regret learning dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems.

Model-Free Online Learning in Unknown Sequential Decision Making Problems and Games

no code implementations8 Mar 2021 Gabriele Farina, Tuomas Sandholm

We give an efficient algorithm that achieves $O(T^{3/4})$ regret with high probability for that setting, even when the agent faces an adversarial environment.

counterfactual Decision Making

Bandit Linear Optimization for Sequential Decision Making and Extensive-Form Games

no code implementations8 Mar 2021 Gabriele Farina, Robin Schmucker, Tuomas Sandholm

Tree-form sequential decision making (TFSDM) extends classical one-shot decision making by modeling tree-form interactions between an agent and a potentially adversarial environment.

counterfactual Decision Making

Faster Algorithms for Optimal Ex-Ante Coordinated Collusive Strategies in Extensive-Form Zero-Sum Games

no code implementations21 Sep 2020 Gabriele Farina, Andrea Celli, Nicola Gatti, Tuomas Sandholm

Second, we provide an algorithm that computes such an optimal distribution by only using profiles where only one of the team members gets to randomize in each profile.

Polynomial-Time Computation of Optimal Correlated Equilibria in Two-Player Extensive-Form Games with Public Chance Moves and Beyond

no code implementations NeurIPS 2020 Gabriele Farina, Tuomas Sandholm

As of today, it is known that finding an optimal extensive-form correlated equilibrium (EFCE), extensive-form coarse correlated equilibrium (EFCCE), or normal-form coarse correlated equilibrium (NFCCE) in a two-player extensive-form game is computationally tractable when the game does not include chance moves, and intractable when the game involves chance moves.

Faster Game Solving via Predictive Blackwell Approachability: Connecting Regret Matching and Mirror Descent

no code implementations28 Jul 2020 Gabriele Farina, Christian Kroer, Tuomas Sandholm

In spite of this prevalence, the regret matching (RM) and regret matching+ (RM+) algorithms have been preferred in the practice of solving large-scale games (as the local regret minimizers within the counterfactual regret minimization framework).

counterfactual

Stochastic Regret Minimization in Extensive-Form Games

no code implementations ICML 2020 Gabriele Farina, Christian Kroer, Tuomas Sandholm

Our framework allows us to instantiate several new stochastic methods for solving sequential games.

counterfactual

Efficient Regret Minimization Algorithm for Extensive-Form Correlated Equilibrium

no code implementations NeurIPS 2019 Gabriele Farina, Chun Kai Ling, Fei Fang, Tuomas Sandholm

We show that a regret minimizer can be designed for a scaled extension of any two convex sets, and that from the decomposition we then obtain a global regret minimizer.

Optimistic Regret Minimization for Extensive-Form Games via Dilated Distance-Generating Functions

no code implementations NeurIPS 2019 Gabriele Farina, Christian Kroer, Tuomas Sandholm

Our algorithms provably converge at a rate of $T^{-1}$, which is superior to prior counterfactual regret minimization algorithms.

counterfactual

Coarse Correlation in Extensive-Form Games

no code implementations26 Aug 2019 Gabriele Farina, Tommaso Bianchi, Tuomas Sandholm

Coarse correlation models strategic interactions of rational agents complemented by a correlation device, that is a mediator that can recommend behavior but not enforce it.

Stable-Predictive Optimistic Counterfactual Regret Minimization

no code implementations13 Feb 2019 Gabriele Farina, Christian Kroer, Noam Brown, Tuomas Sandholm

The CFR framework has been a powerful tool for solving large-scale extensive-form games in practice.

counterfactual

Practical exact algorithm for trembling-hand equilibrium refinements in games

no code implementations NeurIPS 2018 Gabriele Farina, Nicola Gatti, Tuomas Sandholm

Nash equilibrium strategies have the known weakness that they do not prescribe rational play in situations that are reached with zero probability according to the strategies themselves, for example, if players have made mistakes.

Ex ante coordination and collusion in zero-sum multi-player extensive-form games

no code implementations NeurIPS 2018 Gabriele Farina, Andrea Celli, Nicola Gatti, Tuomas Sandholm

This paper focuses on zero-sum games where a team of players faces an opponent, as is the case, for example, in Bridge, collusion in poker, and many non-recreational applications such as war, where the colluders do not have time or means of communicating during battle, collusion in bidding, where communication during the auction is illegal, and coordinated swindling in public.

Regret Circuits: Composability of Regret Minimizers

no code implementations6 Nov 2018 Gabriele Farina, Christian Kroer, Tuomas Sandholm

We show that local regret minimizers for the simpler sets can be combined with additional regret minimizers into an aggregate regret minimizer for the composite set.

Solving Large Sequential Games with the Excessive Gap Technique

no code implementations NeurIPS 2018 Christian Kroer, Gabriele Farina, Tuomas Sandholm

We present, to our knowledge, the first GPU implementation of a first-order method for extensive-form games.

counterfactual

Online Convex Optimization for Sequential Decision Processes and Extensive-Form Games

no code implementations10 Sep 2018 Gabriele Farina, Christian Kroer, Tuomas Sandholm

Experiments show that our framework leads to algorithms that scale at a rate comparable to the fastest variants of counterfactual regret minimization for computing Nash equilibrium, and therefore our approach leads to the first algorithm for computing quantal response equilibria in extremely large games.

counterfactual Decision Making

Robust Stackelberg Equilibria in Extensive-Form Games and Extension to Limited Lookahead

no code implementations21 Nov 2017 Christian Kroer, Gabriele Farina, Tuomas Sandholm

We then extend the program to the robust setting for Stackelberg equilibrium under unlimited and under limited lookahead by the opponent.

Regret Minimization in Behaviorally-Constrained Zero-Sum Games

no code implementations ICML 2017 Gabriele Farina, Christian Kroer, Tuomas Sandholm

We use an instantiation of the CFR framework to develop algorithms for solving behaviorally-constrained (and, as a special case, perturbed in the Selten sense) extensive-form games, which allows us to compute approximate Nash equilibrium refinements.

counterfactual

Operation Frames and Clubs in Kidney Exchange

no code implementations25 May 2017 Gabriele Farina, John P. Dickerson, Tuomas Sandholm

A kidney exchange is a centrally-administered barter market where patients swap their willing yet incompatible donors.

Decoding Hidden Markov Models Faster Than Viterbi Via Online Matrix-Vector (max, +)-Multiplication

no code implementations30 Nov 2015 Massimo Cairo, Gabriele Farina, Romeo Rizzi

In this paper, we present a novel algorithm for the maximum a posteriori decoding (MAPD) of time-homogeneous Hidden Markov Models (HMM), improving the worst-case running time of the classical Viterbi algorithm by a logarithmic factor.

Cannot find the paper you are looking for? You can Submit a new open access paper.