no code implementations • 19 Mar 2024 • Liang Zhang, Niao He, Michael Muehlebach
In this work, we propose a simple primal method, termed Constrained Gradient Method (CGM), for addressing functional constrained variational inequality problems, without necessitating any information on the optimal Lagrange multipliers.
1 code implementation • 27 Feb 2024 • Philip Jordan, Anas Barakat, Niao He
We propose an independent policy gradient algorithm for learning approximate constrained Nash equilibria: Each agent observes their own actions and rewards, along with a shared state.
no code implementations • 27 Feb 2024 • Ilyas Fatkhullin, Niao He
This paper revisits the convergence of Stochastic Mirror Descent (SMD) in the contemporary nonconvex optimization setting.
no code implementations • 24 Feb 2024 • Adrian Müller, Pragnya Alatur, Volkan Cevher, Giorgia Ramponi, Niao He
As Efroni et al. (2020) pointed out, it is an open question whether primal-dual algorithms can provably achieve sublinear regret if we do not allow error cancellations.
no code implementations • 8 Feb 2024 • Jiawei Huang, Niao He, Andreas Krause
We study the sample complexity of reinforcement learning (RL) in Mean-Field Games (MFGs) with model-based function approximation that requires strategic exploration to find a Nash Equilibrium policy.
no code implementations • 15 Nov 2023 • Sadegh Khorasani, Saber Salehkaleybar, Negar Kiyavash, Niao He, Matthias Grossglauser
Policy gradient (PG) is widely used in reinforcement learning due to its scalability and good performance.
no code implementations • 6 Nov 2023 • Florian Hübler, Junchi Yang, Xiang Li, Niao He
However, as the assumption is relaxed to the more realistic $(L_0, L_1)$-smoothness, all existing convergence results still necessitate tuning of the stepsize.
no code implementations • NeurIPS 2023 • Liang Zhang, Junchi Yang, Amin Karbasi, Niao He
Particularly, given the inexact initialization oracle, our regularization-based algorithms achieve the best of both worlds - optimal reproducibility and near-optimal gradient complexity - for minimization and minimax optimization.
no code implementations • 14 Oct 2023 • Liang Zhang, Bingcong Li, Kiran Koshy Thekumparampil, Sewoong Oh, Niao He
The widespread practice of fine-tuning large language models (LLMs) on domain-specific data faces two major challenges in memory and privacy.
1 code implementation • 21 Sep 2023 • Kei Ishikawa, Niao He, Takafumi Kanamori
We study policy evaluation of offline contextual bandits subject to unobserved confounders.
1 code implementation • 8 Sep 2023 • Jiduan Wu, Anas Barakat, Ilyas Fatkhullin, Niao He
Our main results are two-fold: (i) in the deterministic setting, we establish the first global last-iterate linear convergence result for the nested algorithm that seeks NE of zero-sum LQ games; (ii) in the model-free setting, we establish a~$\widetilde{\mathcal{O}}(\epsilon^{-2})$ sample complexity using a single-point ZO estimator.
no code implementations • 25 Jun 2023 • Jun Song, Niao He, Lijun Ding, Chaoyue Zhao
Trust-region methods based on Kullback-Leibler divergence are pervasively used to stabilize policy optimization in reinforcement learning.
no code implementations • 13 Jun 2023 • Pragnya Alatur, Giorgia Ramponi, Niao He, Andreas Krause
Multi-agent reinforcement learning (MARL) addresses sequential decision-making problems with multiple agents, where each agent optimizes its own objective.
no code implementations • 12 Jun 2023 • Adrian Müller, Pragnya Alatur, Giorgia Ramponi, Niao He
Unlike existing Lagrangian approaches, our algorithm achieves this regret without the need for the cancellation of errors.
no code implementations • 2 Jun 2023 • Anas Barakat, Ilyas Fatkhullin, Niao He
We consider the reinforcement learning (RL) problem with general utilities which consists in maximizing a function of the state-action occupancy measure.
no code implementations • 18 May 2023 • Jiawei Huang, Batuhan Yardim, Niao He
In this paper, we study the fundamental statistical efficiency of Reinforcement Learning in Mean-Field Control (MFC) and Mean-Field Game (MFG) with general model-based function approximation.
2 code implementations • 26 Feb 2023 • Kei Ishikawa, Niao He
It can be shown that our estimator contains the recently proposed sharp estimator by Dorn and Guo (2022) as a special case, and our method enables a novel extension of the classical marginal sensitivity model using f-divergence.
no code implementations • 3 Feb 2023 • Ilyas Fatkhullin, Anas Barakat, Anastasia Kireeva, Niao He
Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed the development of their theoretical foundations.
no code implementations • 29 Dec 2022 • Batuhan Yardim, Semih Cayci, Matthieu Geist, Niao He
Instead, we show that $N$ agents running policy mirror ascent converge to the Nash equilibrium of the regularized game within $\widetilde{\mathcal{O}}(\varepsilon^{-2})$ samples from a single sample trajectory without a population generative model, up to a standard $\mathcal{O}(\frac{1}{\sqrt{N}})$ error due to the mean field.
no code implementations • 14 Nov 2022 • Hanjun Dai, Yuan Xue, Niao He, Bethany Wang, Na Li, Dale Schuurmans, Bo Dai
In real-world decision-making, uncertainty is important yet difficult to handle.
no code implementations • 31 Oct 2022 • Xiang Li, Junchi Yang, Niao He
Adaptive gradient methods have shown their ability to adjust the stepsizes on the fly in a parameter-agnostic manner, and empirically achieve faster convergence for solving minimization problems.
no code implementations • 2 Jun 2022 • Semih Cayci, Niao He, R. Srikant
Natural actor-critic (NAC) and its variants, equipped with the representation power of neural networks, have demonstrated impressive empirical success in solving Markov decision problems with large state spaces.
no code implementations • 1 Jun 2022 • Junchi Yang, Xiang Li, Niao He
Adaptive algorithms like AdaGrad and AMSGrad are successful in nonconvex optimization owing to their parameter-agnostic ability -- requiring no a priori knowledge about problem-specific parameters nor tuning of learning rates.
no code implementations • 1 Jun 2022 • Liang Zhang, Kiran Koshy Thekumparampil, Sewoong Oh, Niao He
We provide a general framework for solving differentially private stochastic minimax optimization (DP-SMO) problems, which enables the practitioners to bring their own base optimization algorithm and use it as a black-box to obtain the near-optimal privacy-loss trade-off.
no code implementations • 28 May 2022 • Siqi Zhang, Yifan Hu, Liang Zhang, Niao He
We further study the algorithm-dependent generalization bounds via stability arguments of algorithms.
no code implementations • 25 May 2022 • Saeed Masiha, Saber Salehkaleybar, Niao He, Negar Kiyavash, Patrick Thiran
We prove that the total sample complexity of SCRN in achieving $\epsilon$-global optimum is $\mathcal{O}(\epsilon^{-7/(2\alpha)+1})$ for $1\le\alpha< 3/2$ and $\mathcal{\tilde{O}}(\epsilon^{-2/(\alpha)})$ for $3/2\le\alpha\le 2$.
no code implementations • 17 May 2022 • Saber Salehkaleybar, Sadegh Khorasani, Negar Kiyavash, Niao He, Patrick Thiran
SHARP algorithm is parameter-free, achieving $\epsilon$-approximate first-order stationary point with $O(\epsilon^{-3})$ number of trajectories, while using a batch size of $O(1)$ at each iteration.
no code implementations • 20 Feb 2022 • Semih Cayci, Niao He, R. Srikant
We consider the reinforcement learning problem for partially observed Markov decision processes (POMDPs) with large or even countably infinite state spaces, where the controller has access to only noisy observations of the underlying controlled Markov chain.
no code implementations • 19 Jan 2022 • Kiran Koshy Thekumparampil, Niao He, Sewoong Oh
We also provide a direct single-loop algorithm, using the LPD method, that achieves the iteration complexity of $O(\sqrt{\frac{L_x}{\varepsilon}} + \frac{\|A\|}{\sqrt{\mu_y \varepsilon}} + \sqrt{\frac{L_y}{\varepsilon}})$.
1 code implementation • 10 Dec 2021 • Junchi Yang, Antonio Orvieto, Aurelien Lucchi, Niao He
Gradient descent ascent (GDA), the simplest single-loop algorithm for nonconvex minimax optimization, is widely used in practical applications such as generative adversarial networks (GANs) and adversarial training.
no code implementations • NeurIPS 2021 • Yifan Hu, Xin Chen, Niao He
We consider stochastic optimization when one only has access to biased stochastic oracles of the objective, and obtaining stochastic gradients with low biases comes at high costs.
no code implementations • 29 Sep 2021 • Jun Song, Chaoyue Zhao, Niao He
Trust-region methods based on Kullback-Leibler divergence are pervasively used to stabilize policy optimization in reinforcement learning.
no code implementations • 29 Sep 2021 • Ahmet Alacaoglu, Luca Viano, Niao He, Volkan Cevher
Our sample complexities also match the best-known results for global convergence of policy gradient and two time-scale actor-critic algorithms in the single agent setting.
no code implementations • 8 Jun 2021 • Semih Cayci, Niao He, R. Srikant
Furthermore, under mild regularity conditions on the concentrability coefficient and basis vectors, we prove that entropy-regularized NPG exhibits \emph{linear convergence} up to a function approximation error.
no code implementations • 29 Mar 2021 • Siqi Zhang, Junchi Yang, Cristóbal Guzmán, Negar Kiyavash, Niao He
In the averaged smooth finite-sum setting, our proposed algorithm improves over previous algorithms by providing a nearly-tight dependence on the condition number.
no code implementations • 14 Mar 2021 • Donghwan Lee, Niao He, Seungjae Lee, Panagiota Karava, Jianghai Hu
The building sector consumes the largest energy in the world, and there have been considerable research interests in energy consumption and comfort management of buildings.
no code implementations • 2 Mar 2021 • Semih Cayci, Siddhartha Satpathi, Niao He, R. Srikant
In this paper, we study the dynamics of temporal difference learning with neural network-based value function approximation over a general state space, namely, \emph{Neural TD learning}.
no code implementations • 17 Feb 2021 • Donghwan Lee, Jianghai Hu, Niao He
Based on these two systems, we derive a new finite-time error bound of asynchronous Q-learning when a constant stepsize is used.
no code implementations • NeurIPS 2020 • Yingxiang Yang, Negar Kiyavash, Le Song, Niao He
Macroscopic data aggregated from microscopic events are pervasive in machine learning, such as country-level COVID-19 infection statistics based on city-level data.
no code implementations • NeurIPS 2020 • Junchi Yang, Siqi Zhang, Negar Kiyavash, Niao He
We introduce a generic \emph{two-loop} scheme for smooth minimax optimization with strongly-convex-concave objectives.
no code implementations • NeurIPS 2020 • Yifan Hu, Siqi Zhang, Xin Chen, Niao He
Conditional stochastic optimization covers a variety of applications ranging from invariant learning and causal inference to meta-learning.
no code implementations • NeurIPS 2020 • Junchi Yang, Negar Kiyavash, Niao He
Nonconvex minimax problems appear frequently in emerging machine learning applications, such as generative adversarial networks and adversarial learning.
no code implementations • NeurIPS 2020 • Donghwan Lee, Niao He
This paper develops a novel and unified framework to analyze the convergence of a large family of Q-learning algorithms from the switching system perspective.
1 code implementation • NeurIPS 2020 • Wentao Weng, Harsh Gupta, Niao He, Lei Ying, R. Srikant
In this paper, we establish a theoretical comparison between the asymptotic mean-squared error of Double Q-learning and Q-learning.
no code implementations • 25 Feb 2020 • Yifan Hu, Siqi Zhang, Xin Chen, Niao He
Conditional Stochastic Optimization (CSO) covers a variety of applications ranging from meta-learning and causal inference to invariant learning.
no code implementations • L4DC 2020 • Donghwan Lee, Niao He
The use of target networks is a common practice in deep reinforcement learning for stabilizing the training; however, theoretical understanding of this technique is still limited.
no code implementations • 22 Feb 2020 • Junchi Yang, Negar Kiyavash, Niao He
Nonconvex minimax problems appear frequently in emerging machine learning applications, such as generative adversarial networks and adversarial learning.
no code implementations • 4 Dec 2019 • Donghwan Lee, Niao He
In this paper, we introduce a unified framework for analyzing a large family of Q-learning algorithms, based on switching system perspectives and ODE-based stochastic approximation.
no code implementations • NeurIPS 2019 • Yingxiang Yang, Haoxiang Wang, Negar Kiyavash, Niao He
The nonparametric learning of positive-valued functions appears widely in machine learning, especially in the context of estimating intensity functions of point processes.
no code implementations • 1 Dec 2019 • Donghwan Lee, Niao He, Parameswaran Kamalaruban, Volkan Cevher
This article reviews recent advances in multi-agent reinforcement learning algorithms for large-scale control systems and communication networks, which learn to communicate and cooperate.
Distributed Optimization Multi-agent Reinforcement Learning +2
no code implementations • 28 May 2019 • Yifan Hu, Xin Chen, Niao He
In this paper, we study a class of stochastic optimization problems, referred to as the \emph{Conditional Stochastic Optimization} (CSO), in the form of $\min_{x \in \mathcal{X}} \EE_{\xi}f_\xi\Big({\EE_{\eta|\xi}[g_\eta(x,\xi)]}\Big)$, which finds a wide spectrum of applications including portfolio selection, reinforcement learning, robust learning, causal inference and so on.
1 code implementation • NeurIPS 2019 • Bo Dai, Zhen Liu, Hanjun Dai, Niao He, Arthur Gretton, Le Song, Dale Schuurmans
We present an efficient algorithm for maximum likelihood estimation (MLE) of exponential family models, with a general parametrization of the energy function that includes neural networks.
no code implementations • 24 Apr 2019 • Donghwan Lee, Niao He
The use of target networks has been a popular and key component of recent deep Q-learning algorithms for reinforcement learning, yet little is known from the theory side.
no code implementations • 26 Feb 2019 • Pan Li, Niao He, Olgica Milenkovic
We introduce a new convex optimization problem, termed quadratic decomposable submodular function minimization (QDSFM), which allows to model a number of learning tasks on graphs and hypergraphs.
1 code implementation • NeurIPS 2018 • Bo Dai, Hanjun Dai, Niao He, Weiyang Liu, Zhen Liu, Jianshu Chen, Lin Xiao, Le Song
This flexible function class couples the variational distribution with the original parameters in the graphical models, allowing end-to-end learning of the graphical models by back-propagation through the variational distribution.
no code implementations • NeurIPS 2018 • Yingxiang Yang, Bo Dai, Negar Kiyavash, Niao He
Approximate Bayesian computation (ABC) is an important methodology for Bayesian inference when the likelihood function is intractable.
1 code implementation • 6 Nov 2018 • Bo Dai, Hanjun Dai, Arthur Gretton, Le Song, Dale Schuurmans, Niao He
We investigate penalized maximum log-likelihood estimation for exponential family distributions whose natural parameter resides in a reproducing kernel Hilbert space.
1 code implementation • NeurIPS 2018 • Pan Li, Niao He, Olgica Milenkovic
The problem is closely related to decomposable submodular function minimization and arises in many learning on graphs and hypergraphs settings, such as graph-based semi-supervised learning and PageRank.
no code implementations • 25 Jan 2018 • Yingxiang Yang, Jalal Etesami, Niao He, Negar Kiyavash
In this paper, we design a nonparametric online algorithm for estimating the triggering functions of multivariate Hawkes processes.
no code implementations • ICML 2018 • Bo Dai, Albert Shaw, Lihong Li, Lin Xiao, Niao He, Zhen Liu, Jianshu Chen, Le Song
When function approximation is used, solving the Bellman optimality equation with stability guarantees has remained a major open problem in reinforcement learning for decades.
no code implementations • ICLR 2018 • Bo Dai, Albert Shaw, Niao He, Lihong Li, Le Song
This paper proposes a new actor-critic-style algorithm called Dual Actor-Critic or Dual-AC.
no code implementations • NeurIPS 2017 • Yingxiang Yang, Jalal Etesami, Niao He, Negar Kiyavash
We develop a nonparametric and online learning algorithm that estimates the triggering functions of a multivariate Hawkes process (MHP).
2 code implementations • ICML 2017 • Bo Dai, Ruiqi Guo, Sanjiv Kumar, Niao He, Le Song
Learning-based binary hashing has become a powerful paradigm for fast search and retrieval in massive databases.
no code implementations • 3 Aug 2016 • Niao He, Zaid Harchaoui, Yichen Wang, Le Song
Since almost all gradient-based optimization algorithms rely on Lipschitz-continuity, optimizing Poisson likelihood models with a guarantee of convergence can be challenging, especially for large-scale problems.
no code implementations • 15 Jul 2016 • Bo Dai, Niao He, Yunpeng Pan, Byron Boots, Le Song
In such problems, each sample $x$ itself is associated with a conditional distribution $p(z|x)$ represented by samples $\{z_i\}_{i=1}^M$, and the goal is to learn a function $f$ that links these conditional distributions to target values $y$.
no code implementations • NeurIPS 2015 • Nan Du, Yichen Wang, Niao He, Jimeng Sun, Le Song
By making personalized suggestions, a recommender system is playing a crucial role in improving the engagement of users in modern web-services.
no code implementations • NeurIPS 2015 • Niao He, Zaid Harchaoui
We propose a new first-order optimisation algorithm to solve high-dimensional non-smooth composite minimisation problems.
no code implementations • 9 Jun 2015 • Bo Dai, Niao He, Hanjun Dai, Le Song
Bayesian methods are appealing in their flexibility in modeling complex data and ability in capturing uncertainty in parameters.
1 code implementation • NeurIPS 2014 • Bo Dai, Bo Xie, Niao He, YIngyu Liang, Anant Raj, Maria-Florina Balcan, Le Song
The general perception is that kernel methods are not scalable, and neural nets are the methods of choice for nonlinear learning problems.