Search Results for author: Yingbin Liang

Found 99 papers, 12 papers with code

Transformers Provably Learn Feature-Position Correlations in Masked Image Modeling

no code implementations4 Mar 2024 Yu Huang, Zixin Wen, Yuejie Chi, Yingbin Liang

Masked image modeling (MIM), which predicts randomly masked patches from unmasked ones, has emerged as a promising approach in self-supervised vision pretraining.

Position

Take the Bull by the Horns: Hard Sample-Reweighted Continual Training Improves LLM Generalization

1 code implementation22 Feb 2024 Xuxi Chen, Zhendong Wang, Daouda Sow, Junjie Yang, Tianlong Chen, Yingbin Liang, Mingyuan Zhou, Zhangyang Wang

Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets, with a specific focus on selective retention of samples that incur moderately high losses.

Non-asymptotic Convergence of Discrete-time Diffusion Models: New Approach and Improved Rate

no code implementations21 Feb 2024 Yuchen Liang, Peizhong Ju, Yingbin Liang, Ness Shroff

In this paper, we establish the convergence guarantee for substantially larger classes of distributions under discrete-time diffusion models and further improve the convergence rate for distributions with bounded support.

Denoising

Sample Complexity Characterization for Linear Contextual MDPs

no code implementations5 Feb 2024 Junze Deng, Yuan Cheng, Shaofeng Zou, Yingbin Liang

Our result for the second model is the first-known result for such a type of function approximation models.

Rethinking PGD Attack: Is Sign Function Necessary?

1 code implementation3 Dec 2023 Junjie Yang, Tianlong Chen, Xuxi Chen, Zhangyang Wang, Yingbin Liang

Based on that, we further propose a new raw gradient descent (RGD) algorithm that eliminates the use of sign.

Meta ControlNet: Enhancing Task Adaptation via Meta Learning

1 code implementation3 Dec 2023 Junjie Yang, Jinze Zhao, Peihao Wang, Zhangyang Wang, Yingbin Liang

However, vanilla ControlNet generally requires extensive training of around 5000 steps to achieve a desirable control for a single task.

Edge Detection Image Generation +1

Provable Benefits of Multi-task RL under Non-Markovian Decision Making Processes

no code implementations20 Oct 2023 Ruiquan Huang, Yuan Cheng, Jing Yang, Vincent Tan, Yingbin Liang

To this end, we posit a joint model class for tasks and use the notion of $\eta$-bracketing number to quantify its complexity; this number also serves as a general metric to capture the similarity of tasks and thus determines the benefit of multi-task over single-task RL.

Decision Making Multi-Task Learning +1

In-Context Convergence of Transformers

no code implementations8 Oct 2023 Yu Huang, Yuan Cheng, Yingbin Liang

For data with balanced features, we establish the finite-time convergence guarantee with near-zero prediction error by navigating our analysis over two phases of the training dynamics of the attention map.

In-Context Learning

Model-Free Algorithm with Improved Sample Efficiency for Zero-Sum Markov Games

no code implementations17 Aug 2023 Songtao Feng, Ming Yin, Yu-Xiang Wang, Jing Yang, Yingbin Liang

In this work, we propose a model-free stage-based Q-learning algorithm and show that it achieves the same sample complexity as the best model-based algorithm, and hence for the first time demonstrate that model-free algorithms can enjoy the same optimality in the $H$ dependence as model-based algorithms.

Multi-agent Reinforcement Learning Q-Learning +1

Doubly Robust Instance-Reweighted Adversarial Training

no code implementations1 Aug 2023 Daouda Sow, Sen Lin, Zhangyang Wang, Yingbin Liang

Experiments on standard classification datasets demonstrate that our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance, and at the same time improves the robustness against attacks on the weakest data points.

Provably Efficient UCB-type Algorithms For Learning Predictive State Representations

no code implementations1 Jul 2023 Ruiquan Huang, Yingbin Liang, Jing Yang

The general sequential decision-making problem, which includes Markov decision processes (MDPs) and partially observable MDPs (POMDPs) as special cases, aims at maximizing a cumulative reward by making a sequence of decisions based on a history of observations and actions over time.

Computational Efficiency Decision Making

Theoretical Hardness and Tractability of POMDPs in RL with Partial Online State Information

no code implementations14 Jun 2023 Ming Shi, Yingbin Liang, Ness Shroff

However, existing theoretical results have shown that learning in POMDPs is intractable in the worst case, where the main challenge lies in the lack of latent state information.

Generalization Performance of Transfer Learning: Overparameterized and Underparameterized Regimes

no code implementations8 Jun 2023 Peizhong Ju, Sen Lin, Mark S. Squillante, Yingbin Liang, Ness B. Shroff

For example, when the total number of features in the source task's learning model is fixed, we show that it is more advantageous to allocate a greater number of redundant features to the task-specific part rather than the common part.

Transfer Learning

Non-stationary Reinforcement Learning under General Function Approximation

no code implementations1 Jun 2023 Songtao Feng, Ming Yin, Ruiquan Huang, Yu-Xiang Wang, Jing Yang, Yingbin Liang

To the best of our knowledge, this is the first dynamic regret analysis in non-stationary MDPs with general function approximation.

reinforcement-learning Reinforcement Learning (RL)

Theoretical Characterization of the Generalization Performance of Overfitted Meta-Learning

no code implementations9 Apr 2023 Peizhong Ju, Yingbin Liang, Ness B. Shroff

However, due to the uniqueness of meta-learning such as task-specific gradient descent inner training and the diversity/fluctuation of the ground-truth signals among training tasks, we find new and interesting properties that do not exist in single-task linear regression.

Meta-Learning regression

Improved Sample Complexity for Reward-free Reinforcement Learning under Low-rank MDPs

no code implementations20 Mar 2023 Yuan Cheng, Ruiquan Huang, Jing Yang, Yingbin Liang

In this work, we first provide the first known sample complexity lower bound that holds for any algorithm under low-rank MDPs.

reinforcement-learning Reinforcement Learning (RL) +1

M-L2O: Towards Generalizable Learning-to-Optimize by Test-Time Fast Self-Adaptation

1 code implementation28 Feb 2023 Junjie Yang, Xuxi Chen, Tianlong Chen, Zhangyang Wang, Yingbin Liang

This data-driven procedure yields L2O that can efficiently solve problems similar to those seen in training, that is, drawn from the same ``task distribution".

Learning to Generalize Provably in Learning to Optimize

1 code implementation22 Feb 2023 Junjie Yang, Tianlong Chen, Mingkang Zhu, Fengxiang He, DaCheng Tao, Yingbin Liang, Zhangyang Wang

While the optimizer generalization has been recently studied, the optimizee generalization (or learning to generalize) has not been rigorously studied in the L2O context, which is the aim of this paper.

Theory on Forgetting and Generalization of Continual Learning

no code implementations12 Feb 2023 Sen Lin, Peizhong Ju, Yingbin Liang, Ness Shroff

In particular, there is a lack of understanding on what factors are important and how they affect "catastrophic forgetting" and generalization performance.

Continual Learning

A Near-Optimal Algorithm for Safe Reinforcement Learning Under Instantaneous Hard Constraints

no code implementations8 Feb 2023 Ming Shi, Yingbin Liang, Ness Shroff

In many applications of Reinforcement Learning (RL), it is critically important that the algorithm performs safely, such that instantaneous hard constraints are satisfied at each step, and unsafe states and actions are avoided.

reinforcement-learning Reinforcement Learning (RL) +1

Near-Optimal Adversarial Reinforcement Learning with Switching Costs

no code implementations8 Feb 2023 Ming Shi, Yingbin Liang, Ness Shroff

Our lower bound indicates that, due to the fundamental challenge of switching costs in adversarial RL, the best achieved regret (whose dependency on $T$ is $\tilde{O}(\sqrt{T})$) in static RL with switching costs (as well as adversarial RL without switching costs) is no longer achievable.

reinforcement-learning Reinforcement Learning (RL)

Algorithm Design for Online Meta-Learning with Task Boundary Detection

no code implementations2 Feb 2023 Daouda Sow, Sen Lin, Yingbin Liang, Junshan Zhang

More specifically, we first propose two simple but effective detection mechanisms of task switches and distribution shift based on empirical observations, which serve as a key building block for more elegant online model updates in our algorithm: the task switch detection mechanism allows reusing of the best model available for the current task at hand, and the distribution shift detection mechanism differentiates the meta model update in order to preserve the knowledge for in-distribution tasks and quickly learn the new knowledge for out-of-distribution tasks.

Boundary Detection Meta-Learning

Pruning Before Training May Improve Generalization, Provably

no code implementations1 Jan 2023 Hongru Yang, Yingbin Liang, Xiaojie Guo, Lingfei Wu, Zhangyang Wang

It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero and the network exhibits good generalization performance.

Network Pruning

Convergence and Generalization of Wide Neural Networks with Large Bias

no code implementations1 Jan 2023 Hongru Yang, Ziyu Jiang, Ruizhe Zhang, Zhangyang Wang, Yingbin Liang

This work studies training one-hidden-layer overparameterized ReLU networks via gradient descent in the neural tangent kernel (NTK) regime, where the networks' biases are initialized to some constant rather than zero.

Global Convergence of Two-timescale Actor-Critic for Solving Linear Quadratic Regulator

no code implementations18 Aug 2022 Xuyang Chen, Jingliang Duan, Yingbin Liang, Lin Zhao

To our knowledge, this is the first finite-time convergence analysis for the single sample two-timescale AC for solving LQR with global optimality.

Safe Exploration Incurs Nearly No Additional Sample Complexity for Reward-free RL

no code implementations28 Jun 2022 Ruiquan Huang, Jing Yang, Yingbin Liang

In particular, we consider the scenario where a safe baseline policy is known beforehand, and propose a unified Safe reWard-frEe ExploraTion (SWEET) framework.

Safe Exploration

Provable Generalization of Overparameterized Meta-learning Trained with SGD

no code implementations18 Jun 2022 Yu Huang, Yingbin Liang, Longbo Huang

Despite the superior empirical success of deep meta-learning, theoretical understanding of overparameterized meta-learning is still limited.

Generalization Bounds Meta-Learning

Provable Benefit of Multitask Representation Learning in Reinforcement Learning

no code implementations13 Jun 2022 Yuan Cheng, Songtao Feng, Jing Yang, Hong Zhang, Yingbin Liang

To the best of our knowledge, this is the first theoretical study that characterizes the benefit of representation learning in exploration-based reward-free multitask RL for both upstream and downstream tasks.

Offline RL reinforcement-learning +2

Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward

no code implementations13 Jun 2022 Tengyu Xu, Yue Wang, Shaofeng Zou, Yingbin Liang

The remarkable success of reinforcement learning (RL) heavily relies on observing the reward of every visited state-action pair.

Offline RL reinforcement-learning +1

Will Bilevel Optimizers Benefit from Loops

no code implementations27 May 2022 Kaiyi Ji, Mingrui Liu, Yingbin Liang, Lei Ying

Existing studies in the literature cover only some of those implementation choices, and the complexity bounds available are not refined enough to enable rigorous comparison among different implementations.

Bilevel Optimization Computational Efficiency

Data Sampling Affects the Complexity of Online SGD over Dependent Data

no code implementations31 Mar 2022 Shaocong Ma, Ziyi Chen, Yi Zhou, Kaiyi Ji, Yingbin Liang

Moreover, we show that online SGD with mini-batch sampling can further substantially improve the sample complexity over online SGD with periodic data-subsampling over highly dependent data.

Stochastic Optimization

A Primal-Dual Approach to Bilevel Optimization with Multiple Inner Minima

no code implementations1 Mar 2022 Daouda Sow, Kaiyi Ji, Ziwei Guan, Yingbin Liang

Existing algorithms designed for such a problem were applicable to restricted situations and do not come with a full guarantee of convergence.

Bilevel Optimization Hyperparameter Optimization +2

Model-Based Offline Meta-Reinforcement Learning with Regularization

no code implementations ICLR 2022 Sen Lin, Jialin Wan, Tengyu Xu, Yingbin Liang, Junshan Zhang

In particular, we devise a new meta-Regularized model-based Actor-Critic (RAC) method for within-task policy optimization, as a key building block of MerPO, using conservative policy evaluation and regularized policy improvement; and the intrinsic tradeoff therein is achieved via striking the right balance between two regularizers, one based on the behavior policy and the other on the meta-policy.

Meta Reinforcement Learning reinforcement-learning +2

Faster Non-asymptotic Convergence for Double Q-learning

no code implementations NeurIPS 2021 Lin Zhao, Huaqing Xiong, Yingbin Liang

This paper tackles the more challenging case of a constant learning rate, and develops new analytical tools that improve the existing convergence rate by orders of magnitude.

Q-Learning

Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process

no code implementations20 Oct 2021 Tianjiao Li, Ziwei Guan, Shaofeng Zou, Tengyu Xu, Yingbin Liang, Guanghui Lan

Despite the challenge of the nonconcave objective subject to nonconcave constraints, the proposed approach is shown to converge to the global optimum with a complexity of $\tilde{\mathcal O}(1/\epsilon)$ in terms of the optimality gap and the constraint violation, which improves the complexity of the existing primal-dual approach by a factor of $\mathcal O(1/\epsilon)$ \citep{ding2020natural, paternain2019constrained}.

PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning Method

no code implementations ICLR 2022 Ziwei Guan, Tengyu Xu, Yingbin Liang

Although ETD has been shown to converge asymptotically to a desirable value function, it is well-known that ETD often encounters a large variance so that its sample complexity can increase exponentially fast with the number of iterations.

How to Improve Sample Complexity of SGD over Highly Dependent Data?

no code implementations29 Sep 2021 Shaocong Ma, Ziyi Chen, Yi Zhou, Kaiyi Ji, Yingbin Liang

Specifically, with a $\phi$-mixing model that captures both exponential and polynomial decay of the data dependence over time, we show that SGD with periodic data-subsampling achieves an improved sample complexity over the standard SGD in the full spectrum of the $\phi$-mixing data dependence.

Stochastic Optimization

Generalizable Learning to Optimize into Wide Valleys

no code implementations29 Sep 2021 Junjie Yang, Tianlong Chen, Mingkang Zhu, Fengxiang He, DaCheng Tao, Yingbin Liang, Zhangyang Wang

Learning to optimize (L2O) has gained increasing popularity in various optimization tasks, since classical optimizers usually require laborious, problem-specific design and hyperparameter tuning.

ES-Based Jacobian Enables Faster Bilevel Optimization

no code implementations29 Sep 2021 Daouda Sow, Kaiyi Ji, Yingbin Liang

Bilevel optimization (BO) has arisen as a powerful tool for solving many modern machine learning problems.

Bilevel Optimization Meta-Learning

A Unified Off-Policy Evaluation Approach for General Value Function

no code implementations6 Jul 2021 Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin Liang

We further show that unlike GTD, the learned GVFs by GenTD are guaranteed to converge to the ground truth GVFs as long as the function approximation power is sufficiently large.

Anomaly Detection Off-policy evaluation

Provably Faster Algorithms for Bilevel Optimization

1 code implementation NeurIPS 2021 Junjie Yang, Kaiyi Ji, Yingbin Liang

Bilevel optimization has been widely applied in many important machine learning applications such as hyperparameter optimization and meta-learning.

Bilevel Optimization Hyperparameter Optimization +1

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

no code implementations23 Feb 2021 Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin Liang

We also show that the overall convergence of DR-Off-PAC is doubly robust to the approximation errors that depend only on the expressive power of approximation functions.

Proximal Gradient Descent-Ascent: Variable Convergence under KŁ Geometry

no code implementations ICLR 2021 Ziyi Chen, Yi Zhou, Tengyu Xu, Yingbin Liang

By leveraging this Lyapunov function and the K{\L} geometry that parameterizes the local geometries of general nonconvex functions, we formally establish the variable convergence of proximal-GDA to a critical point $x^*$, i. e., $x_t\to x^*, y_t\to y^*(x^*)$.

Lower Bounds and Accelerated Algorithms for Bilevel Optimization

no code implementations7 Feb 2021 Kaiyi Ji, Yingbin Liang

Bilevel optimization has recently attracted growing interests due to its wide applications in modern machine learning problems.

Bilevel Optimization

Double Q-learning: New Analysis and Sharper Finite-time Bound

no code implementations1 Jan 2021 Lin Zhao, Huaqing Xiong, Yingbin Liang, Wei zhang

Double Q-learning (Hasselt 2010) has gained significant success in practice due to its effectiveness in overcoming the overestimation issue of Q-learning.

Q-Learning

CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee

no code implementations11 Nov 2020 Tengyu Xu, Yingbin Liang, Guanghui Lan

To demonstrate the theoretical performance of CRPO, we adopt natural policy gradient (NPG) for each policy update step and show that CRPO achieves an $\mathcal{O}(1/\sqrt{T})$ convergence rate to the global optimal policy in the constrained policy set and an $\mathcal{O}(1/\sqrt{T})$ error bound on constraint satisfaction.

reinforcement-learning Reinforcement Learning (RL) +1

Sample Complexity Bounds for Two Timescale Value-based Reinforcement Learning Algorithms

no code implementations10 Nov 2020 Tengyu Xu, Yingbin Liang

For linear TDC, we provide a novel non-asymptotic analysis and show that it attains an $\epsilon$-accurate solution with the optimal sample complexity of $\mathcal{O}(\epsilon^{-1}\log(1/\epsilon))$ under a constant stepsize.

reinforcement-learning Reinforcement Learning (RL) +1

Bilevel Optimization: Convergence Analysis and Enhanced Design

2 code implementations15 Oct 2020 Kaiyi Ji, Junjie Yang, Yingbin Liang

For the AID-based method, we orderwisely improve the previous convergence rate analysis due to a more practical parameter selection as well as a warm start strategy, and for the ITD-based method we establish the first theoretical convergence rate.

Bilevel Optimization Hyperparameter Optimization +1

Finite-Time Analysis for Double Q-learning

no code implementations NeurIPS 2020 Huaqing Xiong, Lin Zhao, Yingbin Liang, Wei zhang

Although Q-learning is one of the most successful algorithms for finding the best action-value function (and thus the optimal policy) in reinforcement learning, its implementation often suffers from large overestimation of Q-function values incurred by random sampling.

Q-Learning

Provably Faster Algorithms for Bilevel Optimization and Applications to Meta-Learning

no code implementations28 Sep 2020 Kaiyi Ji, Junjie Yang, Yingbin Liang

For the AID-based method, we orderwisely improve the previous finite-time convergence analysis due to a more practical parameter selection as well as a warm start strategy, and for the ITD-based method we establish the first theoretical convergence rate.

Bilevel Optimization Hyperparameter Optimization +1

Enhanced First and Zeroth Order Variance Reduced Algorithms for Min-Max Optimization

no code implementations28 Sep 2020 Tengyu Xu, Zhe Wang, Yingbin Liang, H. Vincent Poor

Specifically, a novel variance reduction algorithm SREDA was proposed recently by (Luo et al. 2020) to solve such a problem, and was shown to achieve the optimal complexity dependence on the required accuracy level $\epsilon$.

A Primal Approach to Constrained Policy Optimization: Global Optimality and Finite-Time Analysis

no code implementations28 Sep 2020 Tengyu Xu, Yingbin Liang, Guanghui Lan

To demonstrate the theoretical performance of CRPO, we adopt natural policy gradient (NPG) for each policy update step and show that CRPO achieves an $\mathcal{O}(1/\sqrt{T})$ convergence rate to the global optimal policy in the constrained policy set and an $\mathcal{O}(1/\sqrt{T})$ error bound on constraint satisfaction.

Safe Reinforcement Learning

Spectral Algorithms for Community Detection in Directed Networks

no code implementations9 Aug 2020 Zhe Wang, Yingbin Liang, Pengsheng Ji

Community detection in large social networks is affected by degree heterogeneity of nodes.

Clustering Community Detection

Momentum Q-learning with Finite-Sample Convergence Guarantee

no code implementations30 Jul 2020 Bowen Weng, Huaqing Xiong, Lin Zhao, Yingbin Liang, Wei zhang

For the infinite state-action space case, we establish the convergence guarantee for MomentumQ with linear function approximations and Markovian sampling.

Q-Learning

Analysis of Q-learning with Adaptation and Momentum Restart for Gradient Descent

no code implementations15 Jul 2020 Bowen Weng, Huaqing Xiong, Yingbin Liang, Wei zhang

In this paper, we first characterize the convergence rate for Q-AMSGrad, which is the Q-learning algorithm with AMSGrad update (a commonly adopted alternative of Adam for theoretical analysis).

Atari Games Q-Learning

When Will Generative Adversarial Imitation Learning Algorithms Attain Global Convergence

no code implementations24 Jun 2020 Ziwei Guan, Tengyu Xu, Yingbin Liang

Generative adversarial imitation learning (GAIL) is a popular inverse reinforcement learning approach for jointly optimizing policy and reward from expert trajectories.

Imitation Learning

Convergence of Meta-Learning with Task-Specific Adaptation over Partial Parameters

no code implementations NeurIPS 2020 Kaiyi Ji, Jason D. Lee, Yingbin Liang, H. Vincent Poor

Although model-agnostic meta-learning (MAML) is a very successful algorithm in meta-learning practice, it can have high computational cost because it updates all model parameters over both the inner loop of task-specific adaptation and the outer-loop of meta initialization training.

Meta-Learning

Gradient Free Minimax Optimization: Variance Reduction and Faster Convergence

no code implementations16 Jun 2020 Tengyu Xu, Zhe Wang, Yingbin Liang, H. Vincent Poor

In this paper, we focus on such a gradient-free setting, and consider the nonconvex-strongly-concave minimax stochastic optimization problem.

Stochastic Optimization

Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms

no code implementations7 May 2020 Tengyu Xu, Zhe Wang, Yingbin Liang

In the first nested-loop design, actor's one update of policy is followed by an entire loop of critic's updates of the value function, and the finite-sample analysis of such AC and NAC algorithms have been recently well established.

Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms

no code implementations NeurIPS 2020 Tengyu Xu, Zhe Wang, Yingbin Liang

We show that the overall sample complexity for a mini-batch AC to attain an $\epsilon$-accurate stationary point improves the best known sample complexity of AC by an order of $\mathcal{O}(\epsilon^{-1}\log(1/\epsilon))$, and the overall sample complexity for a mini-batch NAC to attain an $\epsilon$-accurate globally optimal point improves the existing sample complexity of NAC by an order of $\mathcal{O}(\epsilon^{-1}/\log(1/\epsilon))$.

Proximal Gradient Algorithm with Momentum and Flexible Parameter Restart for Nonconvex Optimization

no code implementations26 Feb 2020 Yi Zhou, Zhe Wang, Kaiyi Ji, Yingbin Liang, Vahid Tarokh

Our APG-restart is designed to 1) allow for adopting flexible parameter restart schemes that cover many existing ones; 2) have a global sub-linear convergence rate in nonconvex and nonsmooth optimization; and 3) have guaranteed convergence to a critical point and have various types of asymptotic convergence rates depending on the parameterization of local geometry in nonconvex and nonsmooth optimization.

Theoretical Convergence of Multi-Step Model-Agnostic Meta-Learning

2 code implementations18 Feb 2020 Kaiyi Ji, Junjie Yang, Yingbin Liang

As a popular meta-learning approach, the model-agnostic meta-learning (MAML) algorithm has been widely used due to its simplicity and effectiveness.

Meta-Learning

Robust Stochastic Bandit Algorithms under Probabilistic Unbounded Adversarial Attack

no code implementations17 Feb 2020 Ziwei Guan, Kaiyi Ji, Donald J Bucci Jr, Timothy Y Hu, Joseph Palombo, Michael Liston, Yingbin Liang

This paper investigates the attack model where an adversary attacks with a certain probability at each round, and its attack value can be arbitrary and unbounded if it attacks.

Adversarial Attack

Non-asymptotic Convergence of Adam-type Reinforcement Learning Algorithms under Markovian Sampling

no code implementations15 Feb 2020 Huaqing Xiong, Tengyu Xu, Yingbin Liang, Wei zhang

Despite the wide applications of Adam in reinforcement learning (RL), the theoretical convergence of Adam-type RL algorithms has not been established.

reinforcement-learning Reinforcement Learning (RL)

Reanalysis of Variance Reduced Temporal Difference Learning

no code implementations ICLR 2020 Tengyu Xu, Zhe Wang, Yi Zhou, Yingbin Liang

Furthermore, the variance error (for both i. i. d.\ and Markovian sampling) and the bias error (for Markovian sampling) of VRTD are significantly reduced by the batch size of variance reduction in comparison to those of vanilla TD.

SpiderBoost and Momentum: Faster Variance Reduction Algorithms

no code implementations NeurIPS 2019 Zhe Wang, Kaiyi Ji, Yi Zhou, Yingbin Liang, Vahid Tarokh

SARAH and SPIDER are two recently developed stochastic variance-reduced algorithms, and SPIDER has been shown to achieve a near-optimal first-order oracle complexity in smooth nonconvex optimization.

Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization

no code implementations27 Oct 2019 Kaiyi Ji, Zhe Wang, Yi Zhou, Yingbin Liang

Two types of zeroth-order stochastic algorithms have recently been designed for nonconvex optimization respectively based on the first-order techniques SVRG and SARAH/SPIDER.

History-Gradient Aided Batch Size Adaptation for Variance Reduced Algorithms

no code implementations ICML 2020 Kaiyi Ji, Zhe Wang, Bowen Weng, Yi Zhou, Wei zhang, Yingbin Liang

In this paper, we propose a novel scheme, which eliminates backtracking line search but still exploits the information along optimization path by adapting the batch size via history stochastic gradients.

Distributed SGD Generalizes Well Under Asynchrony

no code implementations29 Sep 2019 Jayanth Regatti, Gaurav Tendolkar, Yi Zhou, Abhishek Gupta, Yingbin Liang

The performance of fully synchronized distributed systems has faced a bottleneck due to the big data trend, under which asynchronous distributed systems are becoming a major popularity due to their powerful scalability.

Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples

no code implementations NeurIPS 2019 Tengyu Xu, Shaofeng Zou, Yingbin Liang

Gradient-based temporal difference (GTD) algorithms are widely used in off-policy learning scenarios.

CAN ALTQ LEARN FASTER: EXPERIMENTS AND THEORY

no code implementations25 Sep 2019 Bowen Weng, Huaqing Xiong, Yingbin Liang, Wei zhang

Differently from the popular Deep Q-Network (DQN) learning, Alternating Q-learning (AltQ) does not fully fit a target Q-function at each iteration, and is generally known to be unstable and inefficient.

Atari Games Q-Learning

Momentum Schemes with Stochastic Variance Reduction for Nonconvex Composite Optimization

no code implementations7 Feb 2019 Yi Zhou, Zhe Wang, Kaiyi Ji, Yingbin Liang, Vahid Tarokh

In this paper, we develop novel momentum schemes with flexible coefficient settings to accelerate SPIDER for nonconvex and nonsmooth composite optimization, and show that the resulting algorithms achieve the near-optimal gradient oracle complexity for achieving a generalized first-order stationary condition.

SGD Converges to Global Minimum in Deep Learning via Star-convex Path

no code implementations ICLR 2019 Yi Zhou, Junjie Yang, Huishuai Zhang, Yingbin Liang, Vahid Tarokh

Stochastic gradient descent (SGD) has been found to be surprisingly effective in training a variety of deep neural networks.

MR-GAN: Manifold Regularized Generative Adversarial Networks

no code implementations22 Nov 2018 Qunwei Li, Bhavya Kailkhura, Rushil Anirudh, Yi Zhou, Yingbin Liang, Pramod Varshney

Despite the growing interest in generative adversarial networks (GANs), training GANs remains a challenging problem, both from a theoretical and a practical standpoint.

Minimax Estimation of Neural Net Distance

no code implementations NeurIPS 2018 Kaiyi Ji, Yingbin Liang

An important class of distance metrics proposed for training generative adversarial networks (GANs) is the integral probability metric (IPM), in which the neural net distance captures the practical GAN training via two neural networks.

SpiderBoost and Momentum: Faster Stochastic Variance Reduction Algorithms

1 code implementation25 Oct 2018 Zhe Wang, Kaiyi Ji, Yi Zhou, Yingbin Liang, Vahid Tarokh

SARAH and SPIDER are two recently developed stochastic variance-reduced algorithms, and SPIDER has been shown to achieve a near-optimal first-order oracle complexity in smooth nonconvex optimization.

Cubic Regularization with Momentum for Nonconvex Optimization

no code implementations9 Oct 2018 Zhe Wang, Yi Zhou, Yingbin Liang, Guanghui Lan

However, such a successful acceleration technique has not yet been proposed for second-order algorithms in nonconvex optimization. In this paper, we apply the momentum scheme to cubic regularized (CR) Newton's method and explore the potential for acceleration.

A Note on Inexact Condition for Cubic Regularized Newton's Method

no code implementations22 Aug 2018 Zhe Wang, Yi Zhou, Yingbin Liang, Guanghui Lan

This note considers the inexact cubic-regularized Newton's method (CR), which has been shown in \cite{Cartis2011a} to achieve the same order-level convergence rate to a secondary stationary point as the exact CR \citep{Nesterov2006}.

Convergence of Cubic Regularization for Nonconvex Optimization under KL Property

no code implementations NeurIPS 2018 Yi Zhou, Zhe Wang, Yingbin Liang

Cubic-regularized Newton's method (CR) is a popular algorithm that guarantees to produce a second-order stationary solution for solving nonconvex optimization problems.

K-medoids Clustering of Data Sequences with Composite Distributions

no code implementations31 Jul 2018 Tiexing Wang, Qunwei Li, Donald J. Bucci, Yingbin Liang, Biao Chen, Pramod K. Varshney

In particular, the error exponent is characterized when either the Kolmogrov-Smirnov distance or the maximum mean discrepancy are used as the distance metric.

Clustering

When Will Gradient Methods Converge to Max-margin Classifier under ReLU Models?

1 code implementation ICLR 2019 Tengyu Xu, Yi Zhou, Kaiyi Ji, Yingbin Liang

We study the implicit bias of gradient descent methods in solving a binary classification problem over a linearly separable dataset.

Binary Classification

Stochastic Variance-Reduced Cubic Regularization for Nonconvex Optimization

no code implementations20 Feb 2018 Zhe Wang, Yi Zhou, Yingbin Liang, Guanghui Lan

Cubic regularization (CR) is an optimization method with emerging popularity due to its capability to escape saddle points and converge to second-order stationary solutions for nonconvex optimization.

Generalization Error Bounds with Probabilistic Guarantee for SGD in Nonconvex Optimization

no code implementations19 Feb 2018 Yi Zhou, Yingbin Liang, Huishuai Zhang

With strongly convex regularizers, we further establish the generalization error bounds for nonconvex loss functions under proximal SGD with high-probability guarantee, i. e., exponential concentration in probability.

Guaranteed Recovery of One-Hidden-Layer Neural Networks via Cross Entropy

no code implementations ICLR 2019 Haoyu Fu, Yuejie Chi, Yingbin Liang

We prove that with Gaussian inputs, the empirical risk based on cross entropy exhibits strong convexity and smoothness {\em uniformly} in a local neighborhood of the ground truth, as soon as the sample complexity is sufficiently large.

Critical Points of Linear Neural Networks: Analytical Forms and Landscape Properties

no code implementations ICLR 2018 Yi Zhou, Yingbin Liang

In this paper, we provide a necessary and sufficient characterization of the analytical forms for the critical points (as well as global minimizers) of the square loss functions for linear neural networks.

Critical Points of Neural Networks: Analytical Forms and Landscape Properties

no code implementations30 Oct 2017 Yi Zhou, Yingbin Liang

We show that the analytical forms of the critical points characterize the values of the corresponding loss functions as well as the necessary and sufficient conditions to achieve global minimum.

Characterization of Gradient Dominance and Regularity Conditions for Neural Networks

no code implementations18 Oct 2017 Yi Zhou, Yingbin Liang

The past decade has witnessed a successful application of deep learning to solving many challenging problems in machine learning and artificial intelligence.

Nonconvex Low-Rank Matrix Recovery with Arbitrary Outliers via Median-Truncated Gradient Descent

no code implementations23 Sep 2017 Yuanxin Li, Yuejie Chi, Huishuai Zhang, Yingbin Liang

Recent work has demonstrated the effectiveness of gradient descent for directly recovering the factors of low-rank matrices from random linear measurements in a globally convergent manner when initialized properly.

Convergence Analysis of Proximal Gradient with Momentum for Nonconvex Optimization

no code implementations ICML 2017 Qunwei Li, Yi Zhou, Yingbin Liang, Pramod K. Varshney

Then, by exploiting the Kurdyka-{\L}ojasiewicz (\KL) property for a broad class of functions, we establish the linear and sub-linear convergence rates of the function value sequence generated by APGnc.

Reshaped Wirtinger Flow for Solving Quadratic System of Equations

no code implementations NeurIPS 2016 Huishuai Zhang, Yingbin Liang

In contrast to the smooth loss function used in WF, we adopt a nonsmooth but lower-order loss function, and design a gradient-like algorithm (referred to as reshaped-WF).

Reshaped Wirtinger Flow and Incremental Algorithm for Solving Quadratic System of Equations

1 code implementation25 May 2016 Huishuai Zhang, Yi Zhou, Yingbin Liang, Yuejie Chi

We further develop the incremental (stochastic) reshaped Wirtinger flow (IRWF) and show that IRWF converges linearly to the true signal.

Retrieval

Nonparametric Detection of Geometric Structures over Networks

no code implementations5 Apr 2016 Shaofeng Zou, Yingbin Liang, H. Vincent Poor

Sufficient conditions on minimum and maximum sizes of candidate anomalous intervals are characterized in order to guarantee the proposed test to be consistent.

Median-Truncated Nonconvex Approach for Phase Retrieval with Outliers

no code implementations11 Mar 2016 Huishuai Zhang, Yuejie Chi, Yingbin Liang

This paper investigates the phase retrieval problem, which aims to recover a signal from the magnitudes of its linear measurements.

Retrieval

Analysis of Robust PCA via Local Incoherence

no code implementations NeurIPS 2015 Huishuai Zhang, Yi Zhou, Yingbin Liang

We investigate the robust PCA problem of decomposing an observed matrix into the sum of a low-rank and a sparse error matrices via convex programming Principal Component Pursuit (PCP).

Nonparametric Detection of Anomalous Data Streams

no code implementations25 Apr 2014 Shaofeng Zou, Yingbin Liang, H. Vincent Poor, Xinghua Shi

samples drawn from a distribution p, whereas each anomalous sequence contains m i. i. d.

Two-sample testing

A Kernel-Based Nonparametric Test for Anomaly Detection over Line Networks

no code implementations1 Apr 2014 Shaofeng Zou, Yingbin Liang, H. Vincent Poor

If anomalous interval does not exist, then all nodes receive samples generated by p. It is assumed that the distributions p and q are arbitrary, and are unknown.

Anomaly Detection

Sharp Threshold for Multivariate Multi-Response Linear Regression via Block Regularized Lasso

no code implementations30 Jul 2013 Weiguang Wang, Yingbin Liang, Eric P. Xing

The goal is to recover the support union of all regression vectors using $l_1/l_2$-regularized Lasso.

regression

Cannot find the paper you are looking for? You can Submit a new open access paper.