Search Results for author: Yangchen Pan

Found 24 papers, 10 papers with code

An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models

no code implementations23 Apr 2024 Yangchen Pan, Junfeng Wen, Chenjun Xiao, Philip Torr

In traditional statistical learning, data points are usually assumed to be independently and identically distributed (i. i. d.)

Image Classification Reinforcement Learning (RL)

A Simple Mixture Policy Parameterization for Improving Sample Efficiency of CVaR Optimization

no code implementations17 Mar 2024 Yudong Luo, Yangchen Pan, Han Wang, Philip Torr, Pascal Poupart

Reinforcement learning algorithms utilizing policy gradients (PG) to optimize Conditional Value at Risk (CVaR) face significant challenges with sample inefficiency, hindering their practical applications.

Improving Adversarial Transferability via Model Alignment

no code implementations30 Nov 2023 Avery Ma, Amir-Massoud Farahmand, Yangchen Pan, Philip Torr, Jindong Gu

During the alignment process, the parameters of the source model are fine-tuned to minimize an alignment loss.

Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods

1 code implementation13 Aug 2023 Avery Ma, Yangchen Pan, Amir-Massoud Farahmand

In the context of deep learning, our experiments show that SGD-trained neural networks have smaller Lipschitz constants, explaining the better robustness to input perturbations than those trained with adaptive gradient methods.

The In-Sample Softmax for Offline Reinforcement Learning

4 code implementations28 Feb 2023 Chenjun Xiao, Han Wang, Yangchen Pan, Adam White, Martha White

We highlight a simple fact: it is more straightforward to approximate an in-sample \emph{softmax} using only actions in the dataset.

Offline RL reinforcement-learning +1

Label Alignment Regularization for Distribution Shift

no code implementations27 Nov 2022 Ehsan Imani, Guojun Zhang, Runjia Li, Jun Luo, Pascal Poupart, Philip H. S. Torr, Yangchen Pan

Recent work has highlighted the label alignment property (LAP) in supervised learning, where the vector of all labels in the dataset is mostly in the span of the top few singular vectors of the data matrix.

Representation Learning Sentiment Analysis +1

Memory-efficient Reinforcement Learning with Value-based Knowledge Consolidation

1 code implementation22 May 2022 Qingfeng Lan, Yangchen Pan, Jun Luo, A. Rupam Mahmood

The experience replay buffer, a standard component in deep reinforcement learning, is often used to reduce forgetting and improve sample efficiency by storing experiences in a large buffer and using them for training later.

reinforcement-learning Reinforcement Learning (RL)

STOPS: Short-Term-based Volatility-controlled Policy Search and its Global Convergence

no code implementations24 Jan 2022 Liangliang Xu, Daoming Lyu, Yangchen Pan, Aiwen Jiang, Bo Liu

This paper proposes Short-Term VOlatility-controlled Policy Search (STOPS), a novel algorithm that solves risk-averse problems by learning from short-term trajectories instead of long-term trajectories.

An Alternate Policy Gradient Estimator for Softmax Policies

1 code implementation22 Dec 2021 Shivam Garg, Samuele Tosatto, Yangchen Pan, Martha White, A. Rupam Mahmood

Policy gradient (PG) estimators are ineffective in dealing with softmax policies that are sub-optimally saturated, which refers to the situation when the policy concentrates its probability mass on sub-optimal actions.

Beyond Prioritized Replay: Sampling States in Model-Based RL via Simulated Priorities

1 code implementation28 Sep 2020 Jincheng Mei, Yangchen Pan, Martha White, Amir-Massoud Farahmand, Hengshuai Yao

The prioritized Experience Replay (ER) method has attracted great attention; however, there is little theoretical understanding of such prioritization strategy and why they help.

Understanding and Mitigating the Limitations of Prioritized Experience Replay

2 code implementations19 Jul 2020 Yangchen Pan, Jincheng Mei, Amir-Massoud Farahmand, Martha White, Hengshuai Yao, Mohsen Rohani, Jun Luo

Prioritized Experience Replay (ER) has been empirically shown to improve sample efficiency across many domains and attracted great attention; however, there is little theoretical understanding of why such prioritized sampling helps and its limitations.

Autonomous Driving Continuous Control +1

Maxmin Q-learning: Controlling the Estimation Bias of Q-learning

1 code implementation ICLR 2020 Qingfeng Lan, Yangchen Pan, Alona Fyshe, Martha White

Q-learning suffers from overestimation bias, because it approximates the maximum action value using the maximum estimated action value.

Q-Learning

An implicit function learning approach for parametric modal regression

no code implementations NeurIPS 2020 Yangchen Pan, Ehsan Imani, Martha White, Amir-Massoud Farahmand

We empirically demonstrate on several synthetic problems that our method (i) can learn multi-valued functions and produce the conditional modes, (ii) scales well to high-dimensional inputs, and (iii) can even be more effective for certain uni-modal problems, particularly for high-frequency functions.

regression

Frequency-based Search-control in Dyna

no code implementations ICLR 2020 Yangchen Pan, Jincheng Mei, Amir-Massoud Farahmand

This suggests a search-control strategy: we should use states from high frequency regions of the value function to query the model to acquire more samples.

Model-based Reinforcement Learning

Fuzzy Tiling Activations: A Simple Approach to Learning Sparse Representations Online

1 code implementation ICLR 2021 Yangchen Pan, Kirby Banman, Martha White

Recent work has shown that sparse representations -- where only a small percentage of units are active -- can significantly reduce interference.

Continual Learning Continuous Control +2

Hill Climbing on Value Estimates for Search-control in Dyna

no code implementations18 Jun 2019 Yangchen Pan, Hengshuai Yao, Amir-Massoud Farahmand, Martha White

In this work, we propose to generate such states by using the trajectory obtained from Hill Climbing (HC) the current estimate of the value function.

Model-based Reinforcement Learning Reinforcement Learning (RL)

Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy Improvement

1 code implementation22 Oct 2018 Samuel Neumann, Sungsu Lim, Ajin Joseph, Yangchen Pan, Adam White, Martha White

We first provide a policy improvement result in an idealized setting, and then prove that our conditional CEM (CCEM) strategy tracks a CEM update per state, even with changing action-values.

Policy Gradient Methods Q-Learning

Organizing Experience: A Deeper Look at Replay Mechanisms for Sample-based Planning in Continuous State Domains

no code implementations12 Jun 2018 Yangchen Pan, Muhammad Zaheer, Adam White, Andrew Patterson, Martha White

We show that a model, as opposed to a replay buffer, is particularly useful for specifying which states to sample from during planning, such as predecessor states that propagate information in reverse from a state more quickly.

Effective sketching methods for value function approximation

no code implementations3 Aug 2017 Yangchen Pan, Erfan Sadeqi Azer, Martha White

As a remedy, we demonstrate how to use sketching more sparingly, with only a left-sided sketch, that can still enable significant computational gains and the use of these matrix-based learning algorithms that are less sensitive to parameters.

Reinforcement Learning (RL)

Adapting Kernel Representations Online Using Submodular Maximization

no code implementations ICML 2017 Matthew Schlegel, Yangchen Pan, Jiecao Chen, Martha White

In this work, we develop an approximately submodular criterion for this setting, and an efficient online greedy submodular maximization algorithm for optimizing the criterion.

Continual Learning

Accelerated Gradient Temporal Difference Learning

no code implementations28 Nov 2016 Yangchen Pan, Adam White, Martha White

The family of temporal difference (TD) methods span a spectrum from computationally frugal linear methods like TD({\lambda}) to data efficient least squares methods.

Incremental Truncated LSTD

no code implementations26 Nov 2015 Clement Gehring, Yangchen Pan, Martha White

Balancing between computational efficiency and sample efficiency is an important goal in reinforcement learning.

Computational Efficiency

Cannot find the paper you are looking for? You can Submit a new open access paper.