Search Results for author: Kaiqing Zhang

Found 56 papers, 7 papers with code

Do LLM Agents Have Regret? A Case Study in Online Learning and Games

no code implementations • 25 Mar 2024 • Chanwoo Park, Xiangyu Liu, Asuman Ozdaglar, Kaiqing Zhang

To better understand the limits of LLM agents in these interactive environments, we propose to study their interactions in benchmark decision-making settings in online learning and game theory, through the performance metric of \emph{regret}.

Decision Making

Paper
Add Code

Two-Timescale Q-Learning with Function Approximation in Zero-Sum Stochastic Games

no code implementations • 8 Dec 2023 • Zaiwei Chen, Kaiqing Zhang, Eric Mazumdar, Asuman Ozdaglar, Adam Wierman

Specifically, through a change of variable, we show that the update equation of the slow-timescale iterates resembles the classical smoothed best-response dynamics, where the regularized Nash gap serves as a valid Lyapunov function.

Q-Learning valid

Paper
Add Code

Robot Fleet Learning via Policy Merging

1 code implementation • 2 Oct 2023 • Lirui Wang, Kaiqing Zhang, Allan Zhou, Max Simchowitz, Russ Tedrake

We show that FLEET-MERGE consolidates the behavior of policies trained on 50 tasks in the Meta-World environment, with good performance on nearly all training tasks at test time.

Robot Manipulation

Paper
Code

Partially Observable Multi-agent RL with (Quasi-)Efficiency: The Blessing of Information Sharing

no code implementations • 16 Aug 2023 • Xiangyu Liu, Kaiqing Zhang

Furthermore, we develop a partially observable MARL algorithm that is both statistically and computationally quasi-efficient.

Computational Efficiency Multi-agent Reinforcement Learning

Paper
Add Code

Multi-Player Zero-Sum Markov Games with Networked Separable Interactions

no code implementations • NeurIPS 2023 • Chanwoo Park, Kaiqing Zhang, Asuman Ozdaglar

We study a new class of Markov games, \emph(multi-player) zero-sum Markov Games} with \emph{Networked separable interactions} (zero-sum NMGs), to model the local interaction structure in non-cooperative multi-agent sequential decision-making.

Decision Making

Paper
Add Code

Tackling Combinatorial Distribution Shift: A Matrix Completion Perspective

no code implementations • 12 Jul 2023 • Max Simchowitz, Abhishek Gupta, Kaiqing Zhang

Focusing on the special case where the labels are given by bilinear embeddings into a Hilbert space $H$: $\mathbb{E}[z \mid x, y ]=\langle f_{\star}(x), g_{\star}(y)\rangle_{{H}}$, we aim to extrapolate to a test distribution domain that is $not$ covered in training, i. e., achieving bilinear combinatorial extrapolation.

Matrix Completion

Paper
Add Code

Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs

no code implementations • NeurIPS 2023 • Dongsheng Ding, Chen-Yu Wei, Kaiqing Zhang, Alejandro Ribeiro

To fill this gap, we employ the Lagrangian method to cast a constrained MDP into a constrained saddle-point problem in which max/min players correspond to primal/dual variables, respectively, and develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy.

Paper
Add Code

Learning to Extrapolate: A Transductive Approach

1 code implementation • 27 Apr 2023 • Aviv Netanyahu, Abhishek Gupta, Max Simchowitz, Kaiqing Zhang, Pulkit Agrawal

Machine learning systems, especially with overparameterized deep neural networks, can generalize to novel test instances drawn from the same distribution as the training data.

Imitation Learning

Paper
Code

Breaking the Curse of Multiagents in a Large State Space: RL in Markov Games with Independent Linear Function Approximation

no code implementations • 7 Feb 2023 • Qiwen Cui, Kaiqing Zhang, Simon S. Du

In contrast, existing works for Markov games with function approximation have sample complexity bounds scale with the size of the \emph{joint action space} when specialized to the canonical tabular Markov game setting, which is exponentially large in the number of agents.

Multi-agent Reinforcement Learning

Paper
Add Code

Can Direct Latent Model Learning Solve Linear Quadratic Gaussian Control?

no code implementations • 30 Dec 2022 • Yi Tian, Kaiqing Zhang, Russ Tedrake, Suvrit Sra

We study the task of learning state representations from potentially high-dimensional observations, with the goal of controlling an unknown partially observable system.

Representation Learning

Paper
Add Code

Revisiting the Linear-Programming Framework for Offline RL with General Function Approximation

no code implementations • 28 Dec 2022 • Asuman Ozdaglar, Sarath Pattathil, Jiawei Zhang, Kaiqing Zhang

Offline reinforcement learning (RL) aims to find an optimal policy for sequential decision-making using a pre-collected dataset, without further interaction with the environment.

Decision Making Offline RL +1

Paper
Add Code

An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods

no code implementations • NeurIPS 2020 • Yanli Liu, Kaiqing Zhang, Tamer Başar, Wotao Yin

In this paper, we revisit and improve the convergence of policy gradient (PG), natural PG (NPG) methods, and their variance-reduced variants, under general smooth policy parametrizations.

Policy Gradient Methods

Paper
Add Code

Symmetric (Optimistic) Natural Policy Gradient for Multi-agent Learning with Parameter Convergence

no code implementations • 23 Oct 2022 • Sarath Pattathil, Kaiqing Zhang, Asuman Ozdaglar

We also generalize the results to certain function approximation settings.

Policy Gradient Methods

Paper
Add Code

Does Learning from Decentralized Non-IID Unlabeled Data Benefit from Self Supervision?

1 code implementation • 20 Oct 2022 • Lirui Wang, Kaiqing Zhang, Yunzhu Li, Yonglong Tian, Russ Tedrake

Decentralized learning has been advocated and widely deployed to make efficient use of distributed datasets, with an extensive focus on supervised learning (SL) problems.

Contrastive Learning Representation Learning +1

Paper
Code

Towards a Theoretical Foundation of Policy Optimization for Learning Control Policies

no code implementations • 10 Oct 2022 • Bin Hu, Kaiqing Zhang, Na Li, Mehran Mesbahi, Maryam Fazel, Tamer Başar

Gradient-based methods have been widely used for system design and optimization in diverse application domains.

Continuous Control reinforcement-learning +1

Paper
Add Code

The Power of Regularization in Solving Extensive-Form Games

no code implementations • 19 Jun 2022 • Mingyang Liu, Asuman Ozdaglar, Tiancheng Yu, Kaiqing Zhang

Second, we show that regularized counterfactual regret minimization (\texttt{Reg-CFR}), with a variant of optimistic mirror descent algorithm as regret-minimizer, can achieve $O(1/T^{1/4})$ best-iterate, and $O(1/T^{3/4})$ average-iterate convergence rate for finding NE in EFGs.

counterfactual

Paper
Add Code

What is a Good Metric to Study Generalization of Minimax Learners?

no code implementations • 9 Jun 2022 • Asuman Ozdaglar, Sarath Pattathil, Jiawei Zhang, Kaiqing Zhang

Minimax optimization has served as the backbone of many machine learning (ML) problems.

Generalization Bounds

Paper
Add Code

Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs

no code implementations • 6 Jun 2022 • Dongsheng Ding, Kaiqing Zhang, Jiali Duan, Tamer Başar, Mihailo R. Jovanović

We study sequential decision making problems aimed at maximizing the expected total reward while satisfying a constraint on the expected total utility.

Decision Making

Paper
Add Code

Byzantine-Robust Online and Offline Distributed Reinforcement Learning

no code implementations • 1 Jun 2022 • Yiding Chen, Xuezhou Zhang, Kaiqing Zhang, Mengdi Wang, Xiaojin Zhu

We consider a distributed reinforcement learning setting where multiple agents separately explore the environment and communicate their experiences through a central server.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

The Complexity of Markov Equilibrium in Stochastic Games

no code implementations • 8 Apr 2022 • Constantinos Daskalakis, Noah Golowich, Kaiqing Zhang

Previous work for learning Markov CCE policies all required exponential time and sample complexity in the number of players.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Globally Convergent Policy Search over Dynamic Filters for Output Estimation

no code implementations • 23 Feb 2022 • Jack Umenberger, Max Simchowitz, Juan C. Perdomo, Kaiqing Zhang, Russ Tedrake

In this paper, we provide a new perspective on this challenging problem based on the notion of $\textit{informativity}$, which intuitively requires that all components of a filter's internal state are representative of the true state of the underlying dynamical system.

Paper
Add Code

Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence

no code implementations • 8 Feb 2022 • Dongsheng Ding, Chen-Yu Wei, Kaiqing Zhang, Mihailo R. Jovanović

When there is no uncertainty in the gradient evaluation, we show that our algorithm finds an $\epsilon$-Nash equilibrium with $O(1/\epsilon^2)$ iteration complexity which does not explicitly depend on the state space size.

Multi-agent Reinforcement Learning Policy Gradient Methods +1

Paper
Add Code

Do Differentiable Simulators Give Better Policy Gradients?

no code implementations • 2 Feb 2022 • H. J. Terry Suh, Max Simchowitz, Kaiqing Zhang, Russ Tedrake

Differentiable simulators promise faster computation time for reinforcement learning by replacing zeroth-order gradient estimates of a stochastic objective with an estimate based on first-order gradients.

Paper
Add Code

Independent Learning in Stochastic Games

no code implementations • 23 Nov 2021 • Asuman Ozdaglar, Muhammed O. Sayin, Kaiqing Zhang

We focus on the development of simple and independent learning dynamics for stochastic games: each agent is myopic and chooses best-response type actions to other agents' strategy without any coordination with her opponent.

Autonomous Driving Reinforcement Learning (RL)

Paper
Add Code

On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning

no code implementations • 12 Oct 2021 • Weichao Mao, Lin F. Yang, Kaiqing Zhang, Tamer Başar

Multi-agent reinforcement learning (MARL) algorithms often suffer from an exponential sample complexity dependence on the number of agents, a phenomenon known as \emph{the curse of multiagents}.

Multi-agent Reinforcement Learning Q-Learning +3

Paper
Add Code

Decentralized Cooperative Multi-Agent Reinforcement Learning with Exploration

no code implementations • 29 Sep 2021 • Weichao Mao, Tamer Basar, Lin Yang, Kaiqing Zhang

Many real-world applications of multi-agent reinforcement learning (RL), such as multi-robot navigation and decentralized control of cyber-physical systems, involve the cooperation of agents as a team with aligned objectives.

Multi-agent Reinforcement Learning Q-Learning +3

Paper
Add Code

Decentralized Q-Learning in Zero-sum Markov Games

no code implementations • NeurIPS 2021 • Muhammed O. Sayin, Kaiqing Zhang, David S. Leslie, Tamer Basar, Asuman Ozdaglar

The key challenge in this decentralized setting is the non-stationarity of the environment from an agent's perspective, since both her own payoffs and the system evolution depend on the actions of other agents, and each agent adapts her policies simultaneously and independently.

Multi-agent Reinforcement Learning Q-Learning

Paper
Add Code

Learning Safe Multi-Agent Control with Decentralized Neural Barrier Certificates

1 code implementation • ICLR 2021 • Zengyi Qin, Kaiqing Zhang, Yuxiao Chen, Jingkai Chen, Chuchu Fan

We propose a novel joint-learning framework that can be implemented in a decentralized fashion, with generalization guarantees for certain function classes.

Paper
Code

Derivative-Free Policy Optimization for Linear Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity

no code implementations • NeurIPS 2021 • Kaiqing Zhang, Xiangyuan Zhang, Bin Hu, Tamer Başar

Direct policy search serves as one of the workhorses in modern reinforcement learning (RL), and its applications in continuous control tasks have recently attracted increasing attention.

Continuous Control Multi-agent Reinforcement Learning +2

Paper
Add Code

Towards Understanding Asynchronous Advantage Actor-critic: Convergence and Linear Speedup

no code implementations • 31 Dec 2020 • Han Shen, Kaiqing Zhang, Mingyi Hong, Tianyi Chen

Asynchronous and parallel implementation of standard reinforcement learning (RL) algorithms is a key enabler of the tremendous success of modern RL.

Atari Games OpenAI Gym +1

Paper
Add Code

Natural Policy Gradient Primal-Dual Method for Constrained Markov Decision Processes

no code implementations • NeurIPS 2020 • Dongsheng Ding, Kaiqing Zhang, Tamer Basar, Mihailo Jovanovic

To the best of our knowledge, our work is the first to establish non-asymptotic convergence guarantees of policy-based primal-dual methods for solving infinite-horizon discounted CMDPs.

Decision Making

Paper
Add Code

Robust Multi-Agent Reinforcement Learning with Model Uncertainty

no code implementations • NeurIPS 2020 • Kaiqing Zhang, Tao Sun, Yunzhe Tao, Sahika Genc, Sunil Mallya, Tamer Basar

In contrast, we model the problem as a robust Markov game, where the goal of all agents is to find policies such that no agent has the incentive to deviate, i. e., reach some equilibrium point, which is also robust to the possible uncertainty of the MARL model.

Multi-agent Reinforcement Learning Q-Learning +2

Paper
Add Code

On the Stability and Convergence of Robust Adversarial Reinforcement Learning: A Case Study on Linear Quadratic Systems

no code implementations • NeurIPS 2020 • Kaiqing Zhang, Bin Hu, Tamer Basar

We find: i) the conventional RARL framework (Pinto et al., 2017) can learn a destabilizing policy if the initial policy does not enjoy the robust stability property against the adversary; and ii) with robustly stabilizing initializations, our proposed double-loop RARL algorithm provably converges to the global optimal cost while maintaining robust stability on-the-fly.

Continuous Control Reinforcement Learning (RL)

Paper
Add Code

Model-Free Non-Stationary RL: Near-Optimal Regret and Applications in Multi-Agent RL and Inventory Control

no code implementations • 7 Oct 2020 • Weichao Mao, Kaiqing Zhang, Ruihao Zhu, David Simchi-Levi, Tamer Başar

We consider model-free reinforcement learning (RL) in non-stationary Markov decision processes.

Computational Efficiency Q-Learning +1

Paper
Add Code

Near-Optimal Regret Bounds for Model-Free RL in Non-Stationary Episodic MDPs

no code implementations • 28 Sep 2020 • Weichao Mao, Kaiqing Zhang, Ruihao Zhu, David Simchi-Levi, Tamer Basar

We consider model-free reinforcement learning (RL) in non-stationary Markov decision processes (MDPs).

Q-Learning Reinforcement Learning (RL)

Paper
Add Code

Reinforcement Learning in Non-Stationary Discrete-Time Linear-Quadratic Mean-Field Games

no code implementations • 9 Sep 2020 • Muhammad Aneeq uz Zaman, Kaiqing Zhang, Erik Miehling, Tamer Başar

We propose an actor-critic algorithm to iteratively compute the mean-field equilibrium (MFE) of the LQ-MFG.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity

no code implementations • NeurIPS 2020 • Kaiqing Zhang, Sham M. Kakade, Tamer Başar, Lin F. Yang

This is in contrast to the usual reward-aware setting, with a $\tilde\Omega(|S|(|A|+|B|)(1-\gamma)^{-3}\epsilon^{-2})$ lower bound, where this model-based approach is near-optimal with only a gap on the $|A|,|B|$ dependence.

Model-based Reinforcement Learning Reinforcement Learning (RL)

Paper
Add Code

Policy Optimization for $\mathcal{H}_{2}$ Linear Control with $\mathcal{H}_{\infty}$ Robustness Guarantee: Implicit Regularization and Global Convergence

no code implementations • L4DC 2020 • Kaiqing Zhang, Bin Hu, Tamer Basar

In this paper, we study the convergence theory of PO for $\mathcal{H}_{2}$ linear control with $\mathcal{H}_{\infty}$ robustness guarantee.

Reinforcement Learning (RL)

Paper
Add Code

POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis

no code implementations • NeurIPS 2020 • Weichao Mao, Kaiqing Zhang, Qiaomin Xie, Tamer Başar

Monte-Carlo planning, as exemplified by Monte-Carlo Tree Search (MCTS), has demonstrated remarkable performance in applications with finite spaces.

Paper
Add Code

Information State Embedding in Partially Observable Cooperative Multi-Agent Reinforcement Learning

1 code implementation • 2 Apr 2020 • Weichao Mao, Kaiqing Zhang, Erik Miehling, Tamer Başar

To enable the development of tractable algorithms, we introduce the concept of an information state embedding that serves to compress agents' histories.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Code

Fully Asynchronous Policy Evaluation in Distributed Reinforcement Learning over Networks

no code implementations • 1 Mar 2020 • Xingyu Sha, Jia-Qi Zhang, Keyou You, Kaiqing Zhang, Tamer Başar

This paper proposes a \emph{fully asynchronous} scheme for the policy evaluation problem of distributed reinforcement learning (DisRL) over directed peer-to-peer networks.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Decentralized Multi-Agent Reinforcement Learning with Networked Agents: Recent Advances

no code implementations • 9 Dec 2019 • Kaiqing Zhang, Zhuoran Yang, Tamer Başar

Multi-agent reinforcement learning (MARL) has long been a significant and everlasting research topic in both machine learning and control.

Decision Making Multi-agent Reinforcement Learning +2

Paper
Add Code

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

no code implementations • 24 Nov 2019 • Kaiqing Zhang, Zhuoran Yang, Tamer Başar

Orthogonal to the existing reviews on MARL, we highlight several new angles and taxonomies of MARL theory, including learning in extensive-form games, decentralized MARL with networked agents, MARL in the mean-field regime, (non-)convergence of policy-based methods for learning in games, etc.

Autonomous Driving Decision Making +3

Paper
Add Code

Non-Cooperative Inverse Reinforcement Learning

no code implementations • NeurIPS 2019 • Xiangyuan Zhang, Kaiqing Zhang, Erik Miehling, Tamer Başar

Through interacting with the more informed player, the less informed player attempts to both infer, and act according to, the true objective function.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Policy Optimization for $\mathcal{H}_2$ Linear Control with $\mathcal{H}_\infty$ Robustness Guarantee: Implicit Regularization and Global Convergence

no code implementations • 21 Oct 2019 • Kaiqing Zhang, Bin Hu, Tamer Başar

In this paper, we study the convergence theory of PO for $\mathcal{H}_2$ linear control with $\mathcal{H}_\infty$-norm robustness guarantee.

Policy Gradient Methods Reinforcement Learning (RL)

Paper
Add Code

Online Planning for Decentralized Stochastic Control with Partial History Sharing

no code implementations • 6 Aug 2019 • Kaiqing Zhang, Erik Miehling, Tamer Başar

To demonstrate the applicability of the model, we propose a novel collaborative intrusion response model, where multiple agents (defenders) possessing asymmetric information aim to collaboratively defend a computer network.

Decision Making

Paper
Add Code

A Convergence Result for Regularized Actor-Critic Methods

no code implementations • 13 Jul 2019 • Wesley Suttle, Zhuoran Yang, Kaiqing Zhang, Ji Liu

In this paper, we present a probability one convergence proof, under suitable conditions, of a certain class of actor-critic algorithms for finding approximate solutions to entropy-regularized MDPs using the machinery of stochastic approximation.

Paper
Add Code

A Communication-Efficient Multi-Agent Actor-Critic Algorithm for Distributed Reinforcement Learning

no code implementations • 6 Jul 2019 • Yixuan Lin, Kaiqing Zhang, Zhuoran Yang, Zhaoran Wang, Tamer Başar, Romeil Sandhu, Ji Liu

This paper considers a distributed reinforcement learning problem in which a network of multiple agents aim to cooperatively maximize the globally averaged return through communication with only local neighbors.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Global Convergence of Policy Gradient Methods to (Almost) Locally Optimal Policies

no code implementations • 19 Jun 2019 • Kaiqing Zhang, Alec Koppel, Hao Zhu, Tamer Başar

Under a further strict saddle points assumption, this result establishes convergence to essentially locally-optimal policies of the underlying problem, and thus bridges the gap in existing literature on the convergence of PG methods.

Autonomous Driving Policy Gradient Methods

Paper
Add Code

Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games

no code implementations • NeurIPS 2019 • Kaiqing Zhang, Zhuoran Yang, Tamer Başar

To the best of our knowledge, this work appears to be the first one to investigate the optimization landscape of LQ games, and provably show the convergence of policy optimization methods to the Nash equilibria.

Paper
Add Code

A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning

1 code implementation • 15 Mar 2019 • Wesley Suttle, Zhuoran Yang, Kaiqing Zhang, Zhaoran Wang, Tamer Basar, Ji Liu

This paper extends off-policy reinforcement learning to the multi-agent case in which a set of networked agents communicating with their neighbors according to a time-varying graph collaboratively evaluates and improves a target policy while following a distinct behavior policy.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Communication-Efficient Policy Gradient Methods for Distributed Reinforcement Learning

no code implementations • 7 Dec 2018 • Tianyi Chen, Kaiqing Zhang, Georgios B. Giannakis, Tamer Başar

This paper deals with distributed policy optimization in reinforcement learning, which involves a central controller and a group of learners.

Distributed Computing Multi-agent Reinforcement Learning +3

Paper
Add Code

Finite-Sample Analysis For Decentralized Batch Multi-Agent Reinforcement Learning With Networked Agents

no code implementations • 6 Dec 2018 • Kaiqing Zhang, Zhuoran Yang, Han Liu, Tong Zhang, Tamer Başar

This work appears to be the first finite-sample analysis for batch MARL, a step towards rigorous theoretical understanding of general MARL algorithms in the finite-sample regime.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Distributed Learning of Average Belief Over Networks Using Sequential Observations

no code implementations • 19 Nov 2018 • Kaiqing Zhang, Yang Liu, Ji Liu, Mingyan Liu, Tamer Başar

This paper addresses the problem of distributed learning of average belief with sequential observations, in which a network of $n>1$ agents aim to reach a consensus on the average value of their beliefs, by exchanging information only with their neighbors.

Paper
Add Code

Convergent Reinforcement Learning with Function Approximation: A Bilevel Optimization Perspective

no code implementations • 27 Sep 2018 • Zhuoran Yang, Zuyue Fu, Kaiqing Zhang, Zhaoran Wang

We study reinforcement learning algorithms with nonlinear function approximation in the online setting.

Bilevel Optimization Q-Learning +2

Paper
Add Code

Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents

5 code implementations • ICML 2018 • Kaiqing Zhang, Zhuoran Yang, Han Liu, Tong Zhang, Tamer Başar

To this end, we propose two decentralized actor-critic algorithms with function approximation, which are applicable to large-scale MARL problems where both the number of states and the number of agents are massively large.

Multi-agent Reinforcement Learning reinforcement-learning +1

350

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.